So we return to the initially described setup: multiple parallel workers, each 
making "parse + indexWriter.addDocument()" for single documents with no 
synchronization at my side. This setup was also bad on memory consumption and 
thread blocking, as I reported.

Or did I misunderstand you?

-- 
Igor

22.11.2013, 23:34, "Uwe Schindler" <u...@thetaphi.de>:
> Hi,
> Don't use addDocuments. This method is more made for so called block indexing 
> (where all documents need to be on a block for block joins). Call addDocument 
> for each document possibly from many threads.  By this Lucene can better 
> handle multithreading and free memory early. There is really no need to use 
> bulk adds, this is solely for block joins, where docs need to be sequential 
> and without gaps.
>
> Uwe
>
> Igor Shalyminov <ishalymi...@yandex-team.ru> schrieb:
>
>> - uwe@
>>
>> Thanks Uwe!
>>
>> I changed the logic so that my workers only parse input docs into
>> Documents, and indexWriter does addDocuments() by itself for the chunks
>> of 100 Documents.
>> Unfortunately, this behaviour reproduces: memory usage slightly
>> increases with the number of processed documents, and at some point the
>> program runs very slowly, and it seems that only a single thread is
>> active.
>> It happens after lots of parse/index cycles.
>>
>> The current instance is now in the "single-thread" phase with ~100% CPU
>> and with 8397M RES memory (limit for the VM is -Xmx8G).
>> My question is, when does addDocuments() release all resourses passed
>> in (the Documents themselves)?
>> Are the resourses released after finishing the function call, or I have
>> to do indexWriter.commit() after, say, each chunk?
>>
>> --
>> Igor
>>
>> 21.11.2013, 19:59, "Uwe Schindler" <u...@thetaphi.de>:
>>>  Hi,
>>>
>>>  why are you doing this? Lucene's IndexWriter can handle addDocuments
>> in multiple threads. And, since Lucene 4, it will process them almost
>> completely parallel!
>>>  If you do the addDocuments single-threaded you are adding an
>> additional bottleneck in your application. If you are doing a
>> synchronization on IndexWriter (which I hope you will not do), things
>> will go wrong, too.
>>>  Uwe
>>>
>>>  -----
>>>  Uwe Schindler
>>>  H.-H.-Meier-Allee 63, D-28213 Bremen
>>>  http://www.thetaphi.de
>>>  eMail: u...@thetaphi.de
>>>>   -----Original Message-----
>>>>   From: Igor Shalyminov [mailto:ishalymi...@yandex-team.ru]
>>>>   Sent: Thursday, November 21, 2013 4:45 PM
>>>>   To: java-user@lucene.apache.org
>>>>   Subject: Lucene multithreaded indexing problems
>>>>
>>>>   Hello!
>>>>
>>>>   I tried to perform indexing multithreadedly, with a FixedThreadPool
>> of
>>>>   Callable workers.
>>>>   The main operation - parsing a single document and addDocument() to
>> the
>>>>   index - is done by a single worker.
>>>>   After parsing a document, a lot (really a lot) of Strings appears,
>> and at the
>>>>   end of the worker's call() all of them goes to the indexWriter.
>>>>   I use no merging, the resourses are flushed on disk when the
>> segment size
>>>>   limit is reached.
>>>>
>>>>   The problem is, after a little while (when the most of the heap
>> memory is
>>>>   used) indexer makes no progress, and CPU load is constant 100% (no
>>>>   difference if there are 2 threads or 32). So I think at some point
>> garbage
>>>>   collection takes the whole indexing process down.
>>>>
>>>>   Could you please give some advices on the proper concurrent
>> indexing with
>>>>   Lucene?
>>>>   Can there be "memory leaks" somewhere in the indexWriter? Maybe I
>> must
>>>>   perform some operations with writer to release unused resourses
>> from time
>>>>   to time?
>>>>
>>>>   --
>>>>   Best Regards,
>>>>   Igor
>>  ---------------------------------------------------------------------
>>>>   To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>>   For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>  ---------------------------------------------------------------------
>>>  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>  For additional commands, e-mail: java-user-h...@lucene.apache.org
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
> --
> Uwe Schindler
> H.-H.-Meier-Allee 63, 28213 Bremen
> http://www.thetaphi.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to