Re: Best way to check Solr index for completeness

2010-09-29 Thread Dennis Gearon
How soon do you need to know? Couldn't you just regenerate the index using some 
kind of 'nice' factor to not use too much processor/disk/etc?

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Tue, 9/28/10, dshvadskiy dshvads...@gmail.com wrote:

 From: dshvadskiy dshvads...@gmail.com
 Subject: Re: Best way to check Solr index for completeness
 To: solr-user@lucene.apache.org
 Date: Tuesday, September 28, 2010, 2:11 PM
 
 That will certainly work for most recent updates but I need
 to compare entire
 index.
 
 Dmitriy
 
 Luke Crouch wrote:
  
  Is there a 1:1 ratio of db records to solr documents?
 If so, couldn't you
  simply select the most recent updated record from the
 db and check to make
  sure the corresponding solr doc has the same
 timestamp?
  
  -L
  
  On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy
  dshvads...@gmail.comwrote:
  
  Hello,
  What would be the best way to check Solr index
 against original system
  (Database) to make sure index is up to date? I can
 use Solr fields like
  Id
  and timestamp to check against appropriate fields
 in database. Our index
  currently contains over 2 mln documents across
 several cores. Pulling all
  documents from Solr index via search (1000 docs at
 a time) is very slow.
  Is
  there a better way to do it?
 
  Thanks,
  Dmitriy
 
  
  
 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1598733.html
 Sent from the Solr - User mailing list archive at
 Nabble.com.
 


Re: Best way to check Solr index for completeness

2010-09-29 Thread Peter Karich
How long does it take to get 1000 docs?
Why not ensure this while indexing?

I think besides your suggestion or the suggestion of Luke there is no
other way...

Regards,
Peter.

 Hello,
 What would be the best way to check Solr index against original system
 (Database) to make sure index is up to date? I can use Solr fields like Id
 and timestamp to check against appropriate fields in database. Our index
 currently contains over 2 mln documents across several cores. Pulling all
 documents from Solr index via search (1000 docs at a time) is very slow. Is
 there a better way to do it?

 Thanks,
 Dmitriy
   

-- 
http://jetwick.com twitter search prototype



Re: Best way to check Solr index for completeness

2010-09-29 Thread dshvadskiy

Using TermComponent is an interesting suggestion. However my understanding it
will work only for unique terms. For example compare database primary key
with Solr id field.  A variation of that is to calculate some kind of unique
record hash and store it in the index.Then retrieve id and hash via
TermComponent and compare them with hash calculated on database record. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1602597.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way to check Solr index for completeness

2010-09-29 Thread dshvadskiy

Regenerating index is a slow operation due to limitation of the source
systems. We run several complex SQL statements to generate 1 Solr document.
Full reindex takes about 24 hours.  
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1602610.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way to check Solr index for completeness

2010-09-29 Thread dshvadskiy

Actually retrieving 1000 docs via search isn't that bad. Turned out it takes
under 1 sec.  I still like the idea of using TermComponent and will use it
in the future if number of docs in the index will grow. Thanks for all
suggestions.
Dmitriy
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1603108.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way to check Solr index for completeness

2010-09-29 Thread Walter Underwood
Think about what fields you need to return. For this, you probably only need 
the id. That could be a lot faster than the default set of fields.

wunder

On Sep 29, 2010, at 9:04 AM, dshvadskiy wrote:

 
 Actually retrieving 1000 docs via search isn't that bad. Turned out it takes
 under 1 sec.  I still like the idea of using TermComponent and will use it
 in the future if number of docs in the index will grow. Thanks for all
 suggestions.
 Dmitriy
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1603108.html
 Sent from the Solr - User mailing list archive at Nabble.com.






Re: Best way to check Solr index for completeness

2010-09-29 Thread Erick Erickson
Yep, I was thinking of this on a uniqueKey field. I was assuming that
there was
a PK in the database that you were mapping to the uniqueKey field, but if
that's
not so then it's more of a problem.

But you'd have problems anyway if you *don't* have a uniqueKey when it comes
time
to update any records, so it might be worth going back around and putting
one in...

Erick

On Wed, Sep 29, 2010 at 10:40 AM, dshvadskiy dshvads...@gmail.com wrote:


 Using TermComponent is an interesting suggestion. However my understanding
 it
 will work only for unique terms. For example compare database primary key
 with Solr id field.  A variation of that is to calculate some kind of
 unique
 record hash and store it in the index.Then retrieve id and hash via
 TermComponent and compare them with hash calculated on database record.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1602597.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Best way to check Solr index for completeness

2010-09-28 Thread Luke Crouch
Is there a 1:1 ratio of db records to solr documents? If so, couldn't you
simply select the most recent updated record from the db and check to make
sure the corresponding solr doc has the same timestamp?

-L

On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy dshvads...@gmail.comwrote:

 Hello,
 What would be the best way to check Solr index against original system
 (Database) to make sure index is up to date? I can use Solr fields like Id
 and timestamp to check against appropriate fields in database. Our index
 currently contains over 2 mln documents across several cores. Pulling all
 documents from Solr index via search (1000 docs at a time) is very slow. Is
 there a better way to do it?

 Thanks,
 Dmitriy



Re: Best way to check Solr index for completeness

2010-09-28 Thread dshvadskiy

That will certainly work for most recent updates but I need to compare entire
index.

Dmitriy

Luke Crouch wrote:
 
 Is there a 1:1 ratio of db records to solr documents? If so, couldn't you
 simply select the most recent updated record from the db and check to make
 sure the corresponding solr doc has the same timestamp?
 
 -L
 
 On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy
 dshvads...@gmail.comwrote:
 
 Hello,
 What would be the best way to check Solr index against original system
 (Database) to make sure index is up to date? I can use Solr fields like
 Id
 and timestamp to check against appropriate fields in database. Our index
 currently contains over 2 mln documents across several cores. Pulling all
 documents from Solr index via search (1000 docs at a time) is very slow.
 Is
 there a better way to do it?

 Thanks,
 Dmitriy

 
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1598733.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best way to check Solr index for completeness

2010-09-28 Thread Erick Erickson
Have you looked at SOLRs TermComponent? Assuming you have a unique key,
I think you could use TermsComponent to walk that field for comparing
against
your database rather then getting all the documents.

HTH
Erick

On Tue, Sep 28, 2010 at 5:11 PM, dshvadskiy dshvads...@gmail.com wrote:


 That will certainly work for most recent updates but I need to compare
 entire
 index.

 Dmitriy

 Luke Crouch wrote:
 
  Is there a 1:1 ratio of db records to solr documents? If so, couldn't you
  simply select the most recent updated record from the db and check to
 make
  sure the corresponding solr doc has the same timestamp?
 
  -L
 
  On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy
  dshvads...@gmail.comwrote:
 
  Hello,
  What would be the best way to check Solr index against original system
  (Database) to make sure index is up to date? I can use Solr fields like
  Id
  and timestamp to check against appropriate fields in database. Our index
  currently contains over 2 mln documents across several cores. Pulling
 all
  documents from Solr index via search (1000 docs at a time) is very slow.
  Is
  there a better way to do it?
 
  Thanks,
  Dmitriy
 
 
 

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1598733.html
 Sent from the Solr - User mailing list archive at Nabble.com.