Re: Strange missing docs when reindexing with threads.

2009-06-12 Thread Shalin Shekhar Mangar
On Fri, Jun 12, 2009 at 11:40 PM, Alexander Wallace a...@rwmotloc.com wrote:

 Hi all!

 I'm using Solr 1.3 and currently testing reindexing...

 In my client app, i am sending 17494 requests to add documents...  In 3
 different scenarios:

 a) not using threads
 b) using 1 thread
 c) using 2 threads

 In scenario a), everything seems to work fine... In my client log, is see
 17494 requests sent to solr, in solr's log, I see the same number of 'add'
 requests received, and If i search the index, i can see the same amount of
 documents.

 However, if I use 1 thread, I see the right amount of requests in logs, but
 I only find 15k or so documents (this varies a bit every time i run this
 scenario).

 It gets way worse if I use 2 threads... I can see the right amount of
 requests in both logs, but i end up with ~ 600 docs in the index!

 In all scenarios, I don't see any errors on the logs...

 As you can imagine, I need to be able to use multiple threads to speed up
 the process... It is also very concertning that I don't get any errors
 anywhere...

 Looking at solr's admin stats, I see also 17494 cumulative adds, but only a
 tiny fraction of actual documents can be found...

 Any clues?


What is the uniqueKey in your schema.xml? Is it possible that those 17494
documents have a common uniqueKey and are therefore getting overwritten?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Strange missing docs when reindexing with threads.

2009-06-12 Thread Alexander Wallace
Right after I sent the email I went on and checked for uniqueness of 
documents...


In theory the were all supposed to be unique... But i've realized that 
the platform I'm using to reindex, is delaying sending the requests, 
this in combination with my reindexers reusing document fields (instead 
of creating new instances to save on GC) lead to the same document being 
sent many times with invalid data...


I am fairly sure now that this is the source of my problem... My 
reindexers originally used LuceneWriter directly, which blocks thread 
excecution until the document is added to the index, and the new 
framework i'm using uses messaging which releases control back to the 
thread before the documents are actually sent to be indexed, my threads 
update the document fields meanwhile, so the data written to the index 
is transitioning and invalid...


I've done an adjustment to my reindexing threads to ensure new instances 
of everything are used... I will test it shortly...


But you point out exactly why i have less documents than 'add' requests...

Thanks!

Shalin Shekhar Mangar wrote:

On Fri, Jun 12, 2009 at 11:40 PM, Alexander Wallace a...@rwmotloc.com wrote:

  

Hi all!

I'm using Solr 1.3 and currently testing reindexing...

In my client app, i am sending 17494 requests to add documents...  In 3
different scenarios:

a) not using threads
b) using 1 thread
c) using 2 threads

In scenario a), everything seems to work fine... In my client log, is see
17494 requests sent to solr, in solr's log, I see the same number of 'add'
requests received, and If i search the index, i can see the same amount of
documents.

However, if I use 1 thread, I see the right amount of requests in logs, but
I only find 15k or so documents (this varies a bit every time i run this
scenario).

It gets way worse if I use 2 threads... I can see the right amount of
requests in both logs, but i end up with ~ 600 docs in the index!

In all scenarios, I don't see any errors on the logs...

As you can imagine, I need to be able to use multiple threads to speed up
the process... It is also very concertning that I don't get any errors
anywhere...

Looking at solr's admin stats, I see also 17494 cumulative adds, but only a
tiny fraction of actual documents can be found...

Any clues?




What is the uniqueKey in your schema.xml? Is it possible that those 17494
documents have a common uniqueKey and are therefore getting overwritten?

  


Re: Strange missing docs when reindexing with threads.

2009-06-12 Thread Alexander Wallace
That was exactly my issue... i changed my code to not reuse 
document/fields and it is all good now!


Thanks for your support!

Shalin Shekhar Mangar wrote:

On Fri, Jun 12, 2009 at 11:40 PM, Alexander Wallace a...@rwmotloc.com wrote:

  

Hi all!

I'm using Solr 1.3 and currently testing reindexing...

In my client app, i am sending 17494 requests to add documents...  In 3
different scenarios:

a) not using threads
b) using 1 thread
c) using 2 threads

In scenario a), everything seems to work fine... In my client log, is see
17494 requests sent to solr, in solr's log, I see the same number of 'add'
requests received, and If i search the index, i can see the same amount of
documents.

However, if I use 1 thread, I see the right amount of requests in logs, but
I only find 15k or so documents (this varies a bit every time i run this
scenario).

It gets way worse if I use 2 threads... I can see the right amount of
requests in both logs, but i end up with ~ 600 docs in the index!

In all scenarios, I don't see any errors on the logs...

As you can imagine, I need to be able to use multiple threads to speed up
the process... It is also very concertning that I don't get any errors
anywhere...

Looking at solr's admin stats, I see also 17494 cumulative adds, but only a
tiny fraction of actual documents can be found...

Any clues?




What is the uniqueKey in your schema.xml? Is it possible that those 17494
documents have a common uniqueKey and are therefore getting overwritten?