Re: Best way to check Solr index for completeness
Yep, I was thinking of this on a field. I was assuming that there was a PK in the database that you were mapping to the uniqueKey field, but if that's not so then it's more of a problem. But you'd have problems anyway if you *don't* have a uniqueKey when it comes time to update any records, so it might be worth going back around and putting one in... Erick On Wed, Sep 29, 2010 at 10:40 AM, dshvadskiy wrote: > > Using TermComponent is an interesting suggestion. However my understanding > it > will work only for unique terms. For example compare database primary key > with Solr id field. A variation of that is to calculate some kind of > unique > record hash and store it in the index.Then retrieve id and hash via > TermComponent and compare them with hash calculated on database record. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1602597.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Best way to check Solr index for completeness
Think about what fields you need to return. For this, you probably only need the id. That could be a lot faster than the default set of fields. wunder On Sep 29, 2010, at 9:04 AM, dshvadskiy wrote: > > Actually retrieving 1000 docs via search isn't that bad. Turned out it takes > under 1 sec. I still like the idea of using TermComponent and will use it > in the future if number of docs in the index will grow. Thanks for all > suggestions. > Dmitriy > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1603108.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to check Solr index for completeness
Actually retrieving 1000 docs via search isn't that bad. Turned out it takes under 1 sec. I still like the idea of using TermComponent and will use it in the future if number of docs in the index will grow. Thanks for all suggestions. Dmitriy -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1603108.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to check Solr index for completeness
Regenerating index is a slow operation due to limitation of the source systems. We run several complex SQL statements to generate 1 Solr document. Full reindex takes about 24 hours. -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1602610.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to check Solr index for completeness
Using TermComponent is an interesting suggestion. However my understanding it will work only for unique terms. For example compare database primary key with Solr id field. A variation of that is to calculate some kind of unique record hash and store it in the index.Then retrieve id and hash via TermComponent and compare them with hash calculated on database record. -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1602597.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to check Solr index for completeness
How long does it take to get 1000 docs? Why not ensure this while indexing? I think besides your suggestion or the suggestion of Luke there is no other way... Regards, Peter. > Hello, > What would be the best way to check Solr index against original system > (Database) to make sure index is up to date? I can use Solr fields like Id > and timestamp to check against appropriate fields in database. Our index > currently contains over 2 mln documents across several cores. Pulling all > documents from Solr index via search (1000 docs at a time) is very slow. Is > there a better way to do it? > > Thanks, > Dmitriy > -- http://jetwick.com twitter search prototype
Re: Best way to check Solr index for completeness
How soon do you need to know? Couldn't you just regenerate the index using some kind of 'nice' factor to not use too much processor/disk/etc? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Tue, 9/28/10, dshvadskiy wrote: > From: dshvadskiy > Subject: Re: Best way to check Solr index for completeness > To: solr-user@lucene.apache.org > Date: Tuesday, September 28, 2010, 2:11 PM > > That will certainly work for most recent updates but I need > to compare entire > index. > > Dmitriy > > Luke Crouch wrote: > > > > Is there a 1:1 ratio of db records to solr documents? > If so, couldn't you > > simply select the most recent updated record from the > db and check to make > > sure the corresponding solr doc has the same > timestamp? > > > > -L > > > > On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy > > wrote: > > > >> Hello, > >> What would be the best way to check Solr index > against original system > >> (Database) to make sure index is up to date? I can > use Solr fields like > >> Id > >> and timestamp to check against appropriate fields > in database. Our index > >> currently contains over 2 mln documents across > several cores. Pulling all > >> documents from Solr index via search (1000 docs at > a time) is very slow. > >> Is > >> there a better way to do it? > >> > >> Thanks, > >> Dmitriy > >> > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1598733.html > Sent from the Solr - User mailing list archive at > Nabble.com. >
Re: Best way to check Solr index for completeness
Have you looked at SOLRs TermComponent? Assuming you have a unique key, I think you could use TermsComponent to walk that field for comparing against your database rather then getting all the documents. HTH Erick On Tue, Sep 28, 2010 at 5:11 PM, dshvadskiy wrote: > > That will certainly work for most recent updates but I need to compare > entire > index. > > Dmitriy > > Luke Crouch wrote: > > > > Is there a 1:1 ratio of db records to solr documents? If so, couldn't you > > simply select the most recent updated record from the db and check to > make > > sure the corresponding solr doc has the same timestamp? > > > > -L > > > > On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy > > wrote: > > > >> Hello, > >> What would be the best way to check Solr index against original system > >> (Database) to make sure index is up to date? I can use Solr fields like > >> Id > >> and timestamp to check against appropriate fields in database. Our index > >> currently contains over 2 mln documents across several cores. Pulling > all > >> documents from Solr index via search (1000 docs at a time) is very slow. > >> Is > >> there a better way to do it? > >> > >> Thanks, > >> Dmitriy > >> > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1598733.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Best way to check Solr index for completeness
That will certainly work for most recent updates but I need to compare entire index. Dmitriy Luke Crouch wrote: > > Is there a 1:1 ratio of db records to solr documents? If so, couldn't you > simply select the most recent updated record from the db and check to make > sure the corresponding solr doc has the same timestamp? > > -L > > On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy > wrote: > >> Hello, >> What would be the best way to check Solr index against original system >> (Database) to make sure index is up to date? I can use Solr fields like >> Id >> and timestamp to check against appropriate fields in database. Our index >> currently contains over 2 mln documents across several cores. Pulling all >> documents from Solr index via search (1000 docs at a time) is very slow. >> Is >> there a better way to do it? >> >> Thanks, >> Dmitriy >> > > -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1598733.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to check Solr index for completeness
Is there a 1:1 ratio of db records to solr documents? If so, couldn't you simply select the most recent updated record from the db and check to make sure the corresponding solr doc has the same timestamp? -L On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy wrote: > Hello, > What would be the best way to check Solr index against original system > (Database) to make sure index is up to date? I can use Solr fields like Id > and timestamp to check against appropriate fields in database. Our index > currently contains over 2 mln documents across several cores. Pulling all > documents from Solr index via search (1000 docs at a time) is very slow. Is > there a better way to do it? > > Thanks, > Dmitriy >
Best way to check Solr index for completeness
Hello, What would be the best way to check Solr index against original system (Database) to make sure index is up to date? I can use Solr fields like Id and timestamp to check against appropriate fields in database. Our index currently contains over 2 mln documents across several cores. Pulling all documents from Solr index via search (1000 docs at a time) is very slow. Is there a better way to do it? Thanks, Dmitriy