So we return to the initially described setup: multiple parallel workers, each making "parse + indexWriter.addDocument()" for single documents with no synchronization at my side. This setup was also bad on memory consumption and thread blocking, as I reported.
Or did I misunderstand you? -- Igor 22.11.2013, 23:34, "Uwe Schindler" <u...@thetaphi.de>: > Hi, > Don't use addDocuments. This method is more made for so called block indexing > (where all documents need to be on a block for block joins). Call addDocument > for each document possibly from many threads. By this Lucene can better > handle multithreading and free memory early. There is really no need to use > bulk adds, this is solely for block joins, where docs need to be sequential > and without gaps. > > Uwe > > Igor Shalyminov <ishalymi...@yandex-team.ru> schrieb: > >> - uwe@ >> >> Thanks Uwe! >> >> I changed the logic so that my workers only parse input docs into >> Documents, and indexWriter does addDocuments() by itself for the chunks >> of 100 Documents. >> Unfortunately, this behaviour reproduces: memory usage slightly >> increases with the number of processed documents, and at some point the >> program runs very slowly, and it seems that only a single thread is >> active. >> It happens after lots of parse/index cycles. >> >> The current instance is now in the "single-thread" phase with ~100% CPU >> and with 8397M RES memory (limit for the VM is -Xmx8G). >> My question is, when does addDocuments() release all resourses passed >> in (the Documents themselves)? >> Are the resourses released after finishing the function call, or I have >> to do indexWriter.commit() after, say, each chunk? >> >> -- >> Igor >> >> 21.11.2013, 19:59, "Uwe Schindler" <u...@thetaphi.de>: >>> Hi, >>> >>> why are you doing this? Lucene's IndexWriter can handle addDocuments >> in multiple threads. And, since Lucene 4, it will process them almost >> completely parallel! >>> If you do the addDocuments single-threaded you are adding an >> additional bottleneck in your application. If you are doing a >> synchronization on IndexWriter (which I hope you will not do), things >> will go wrong, too. >>> Uwe >>> >>> ----- >>> Uwe Schindler >>> H.-H.-Meier-Allee 63, D-28213 Bremen >>> http://www.thetaphi.de >>> eMail: u...@thetaphi.de >>>> -----Original Message----- >>>> From: Igor Shalyminov [mailto:ishalymi...@yandex-team.ru] >>>> Sent: Thursday, November 21, 2013 4:45 PM >>>> To: java-user@lucene.apache.org >>>> Subject: Lucene multithreaded indexing problems >>>> >>>> Hello! >>>> >>>> I tried to perform indexing multithreadedly, with a FixedThreadPool >> of >>>> Callable workers. >>>> The main operation - parsing a single document and addDocument() to >> the >>>> index - is done by a single worker. >>>> After parsing a document, a lot (really a lot) of Strings appears, >> and at the >>>> end of the worker's call() all of them goes to the indexWriter. >>>> I use no merging, the resourses are flushed on disk when the >> segment size >>>> limit is reached. >>>> >>>> The problem is, after a little while (when the most of the heap >> memory is >>>> used) indexer makes no progress, and CPU load is constant 100% (no >>>> difference if there are 2 threads or 32). So I think at some point >> garbage >>>> collection takes the whole indexing process down. >>>> >>>> Could you please give some advices on the proper concurrent >> indexing with >>>> Lucene? >>>> Can there be "memory leaks" somewhere in the indexWriter? Maybe I >> must >>>> perform some operations with writer to release unused resourses >> from time >>>> to time? >>>> >>>> -- >>>> Best Regards, >>>> Igor >> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- > Uwe Schindler > H.-H.-Meier-Allee 63, 28213 Bremen > http://www.thetaphi.de --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org