- uwe@ Thanks Uwe!
I changed the logic so that my workers only parse input docs into Documents, and indexWriter does addDocuments() by itself for the chunks of 100 Documents. Unfortunately, this behaviour reproduces: memory usage slightly increases with the number of processed documents, and at some point the program runs very slowly, and it seems that only a single thread is active. It happens after lots of parse/index cycles. The current instance is now in the "single-thread" phase with ~100% CPU and with 8397M RES memory (limit for the VM is -Xmx8G). My question is, when does addDocuments() release all resourses passed in (the Documents themselves)? Are the resourses released after finishing the function call, or I have to do indexWriter.commit() after, say, each chunk? -- Igor 21.11.2013, 19:59, "Uwe Schindler" <u...@thetaphi.de>: > Hi, > > why are you doing this? Lucene's IndexWriter can handle addDocuments in > multiple threads. And, since Lucene 4, it will process them almost completely > parallel! > If you do the addDocuments single-threaded you are adding an additional > bottleneck in your application. If you are doing a synchronization on > IndexWriter (which I hope you will not do), things will go wrong, too. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > >> -----Original Message----- >> From: Igor Shalyminov [mailto:ishalymi...@yandex-team.ru] >> Sent: Thursday, November 21, 2013 4:45 PM >> To: java-user@lucene.apache.org >> Subject: Lucene multithreaded indexing problems >> >> Hello! >> >> I tried to perform indexing multithreadedly, with a FixedThreadPool of >> Callable workers. >> The main operation - parsing a single document and addDocument() to the >> index - is done by a single worker. >> After parsing a document, a lot (really a lot) of Strings appears, and at >> the >> end of the worker's call() all of them goes to the indexWriter. >> I use no merging, the resourses are flushed on disk when the segment size >> limit is reached. >> >> The problem is, after a little while (when the most of the heap memory is >> used) indexer makes no progress, and CPU load is constant 100% (no >> difference if there are 2 threads or 32). So I think at some point garbage >> collection takes the whole indexing process down. >> >> Could you please give some advices on the proper concurrent indexing with >> Lucene? >> Can there be "memory leaks" somewhere in the indexWriter? Maybe I must >> perform some operations with writer to release unused resourses from time >> to time? >> >> -- >> Best Regards, >> Igor >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org