- uwe@

Thanks Uwe!

I changed the logic so that my workers only parse input docs into Documents, 
and indexWriter does addDocuments() by itself for the chunks of 100 Documents.
Unfortunately, this behaviour reproduces: memory usage slightly increases with 
the number of processed documents, and at some point the program runs very 
slowly, and it seems that only a single thread is active.
It happens after lots of parse/index cycles.

The current instance is now in the "single-thread" phase with ~100% CPU and 
with 8397M RES memory (limit for the VM is -Xmx8G).
My question is, when does addDocuments() release all resourses passed in (the 
Documents themselves)?
Are the resourses released after finishing the function call, or I have to do 
indexWriter.commit() after, say, each chunk? 

-- 
Igor

21.11.2013, 19:59, "Uwe Schindler" <u...@thetaphi.de>:
> Hi,
>
> why are you doing this? Lucene's IndexWriter can handle addDocuments in 
> multiple threads. And, since Lucene 4, it will process them almost completely 
> parallel!
> If you do the addDocuments single-threaded you are adding an additional 
> bottleneck in your application. If you are doing a synchronization on 
> IndexWriter (which I hope you will not do), things will go wrong, too.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>>  -----Original Message-----
>>  From: Igor Shalyminov [mailto:ishalymi...@yandex-team.ru]
>>  Sent: Thursday, November 21, 2013 4:45 PM
>>  To: java-user@lucene.apache.org
>>  Subject: Lucene multithreaded indexing problems
>>
>>  Hello!
>>
>>  I tried to perform indexing multithreadedly, with a FixedThreadPool of
>>  Callable workers.
>>  The main operation - parsing a single document and addDocument() to the
>>  index - is done by a single worker.
>>  After parsing a document, a lot (really a lot) of Strings appears, and at 
>> the
>>  end of the worker's call() all of them goes to the indexWriter.
>>  I use no merging, the resourses are flushed on disk when the segment size
>>  limit is reached.
>>
>>  The problem is, after a little while (when the most of the heap memory is
>>  used) indexer makes no progress, and CPU load is constant 100% (no
>>  difference if there are 2 threads or 32). So I think at some point garbage
>>  collection takes the whole indexing process down.
>>
>>  Could you please give some advices on the proper concurrent indexing with
>>  Lucene?
>>  Can there be "memory leaks" somewhere in the indexWriter? Maybe I must
>>  perform some operations with writer to release unused resourses from time
>>  to time?
>>
>>  --
>>  Best Regards,
>>  Igor
>>
>>  ---------------------------------------------------------------------
>>  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>  For additional commands, e-mail: java-user-h...@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to