[jira] Updated: (LUCENE-528) Optimization for IndexWriter.addIndexes()
[ http://issues.apache.org/jira/browse/LUCENE-528?page=all ] Ning Li updated LUCENE-528: --- Lucene Fields: [Patch Available] > Optimization for IndexWriter.addIndexes() > - > > Key: LUCENE-528 > URL: http://issues.apache.org/jira/browse/LUCENE-528 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Steven Tamm > Assigned To: Otis Gospodnetic >Priority: Minor > Attachments: AddIndexes.patch, AddIndexesNoOptimize.patch > > > One big performance problem with IndexWriter.addIndexes() is that it has to > optimize the index both before and after adding the segments. When you have > a very large index, to which you are adding batches of small updates, these > calls to optimize make using addIndexes() impossible. It makes parallel > updates very frustrating. > Here is an optimized function that helps out by calling mergeSegments only on > the newly added documents. It will try to avoid calling mergeSegments until > the end, unless you're adding a lot of documents at once. > I also have an extensive unit test that verifies that this function works > correctly if people are interested. I gave it a different name because it > has very different performance characteristics which can make querying take > longer. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-528) Optimization for IndexWriter.addIndexes()
[ http://issues.apache.org/jira/browse/LUCENE-528?page=all ] Ning Li updated LUCENE-528: --- Attachment: AddIndexesNoOptimize.patch This patch implements addIndexesNoOptimize() following the algorithm described earlier. - The patch is based on the latest version from trunk. - AddIndexesNoOptimize() is implemented. The algorithm description is included as comment and the code is commented. - The patch includes a test called TestAddIndexesNoOptimize which covers all the code in addIndexesNoOptimize(). - maybeMergeSegments() was conservative and checked for more merges only when "upperBound * mergeFactor <= maxMergeDocs". Change it to check for more merges when "upperBound < maxMergeDocs". - Minor changes in TestIndexWriterMergePolicy to better verify merge invariants. - The patch passes all unit tests. One more comment on the implementation: - When we copy un-merged segments from S in step 4, ideally, we want to simply copy those segments. However, directory does not support copy yet. In addition, source may use compound file or not and target may use compound file or not. So we use mergeSegments() to copy each segment, which may cause doc count to change because deleted docs are garbage collected. That case is handled properly. > Optimization for IndexWriter.addIndexes() > - > > Key: LUCENE-528 > URL: http://issues.apache.org/jira/browse/LUCENE-528 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Steven Tamm > Assigned To: Otis Gospodnetic >Priority: Minor > Attachments: AddIndexes.patch, AddIndexesNoOptimize.patch > > > One big performance problem with IndexWriter.addIndexes() is that it has to > optimize the index both before and after adding the segments. When you have > a very large index, to which you are adding batches of small updates, these > calls to optimize make using addIndexes() impossible. It makes parallel > updates very frustrating. > Here is an optimized function that helps out by calling mergeSegments only on > the newly added documents. It will try to avoid calling mergeSegments until > the end, unless you're adding a lot of documents at once. > I also have an extensive unit test that verifies that this function works > correctly if people are interested. I gave it a different name because it > has very different performance characteristics which can make querying take > longer. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-528) Optimization for IndexWriter.addIndexes()
[ http://issues.apache.org/jira/browse/LUCENE-528?page=all ] Steven Tamm updated LUCENE-528: --- Attachment: AddIndexes.patch > Optimization for IndexWriter.addIndexes() > - > > Key: LUCENE-528 > URL: http://issues.apache.org/jira/browse/LUCENE-528 > Project: Lucene - Java > Type: Improvement > Components: Index > Reporter: Steven Tamm > Priority: Minor > Attachments: AddIndexes.patch > > One big performance problem with IndexWriter.addIndexes() is that it has to > optimize the index both before and after adding the segments. When you have > a very large index, to which you are adding batches of small updates, these > calls to optimize make using addIndexes() impossible. It makes parallel > updates very frustrating. > Here is an optimized function that helps out by calling mergeSegments only on > the newly added documents. It will try to avoid calling mergeSegments until > the end, unless you're adding a lot of documents at once. > I also have an extensive unit test that verifies that this function works > correctly if people are interested. I gave it a different name because it > has very different performance characteristics which can make querying take > longer. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]