IndexWriter in addIndexes(Directory[] dirs) method optimizes index before and after operation.

Some notes about this:
1). Adding sub indexes to large index can take long because of double optimization. 2). This breaks IndexWriter.maxMergeDocs logic, because optimize will merge data into single segment index.

I suggest add new method with boolean parameter to optionally specify whether index should be optimized.

There is similar method addIndexes(IndexReader[] readers) in IndexWriter that takes array of IndexReaders but I don't know how it can be modified to provide same optional functionality

Patch attached here to discuss it first
(should I post it directly to jira?)


--
regards,
Volodymyr Bychkoviak

Index: 
D:/programming/projects/componence/lucene-dev/src/java/org/apache/lucene/index/IndexWriter.java
===================================================================
--- 
D:/programming/projects/componence/lucene-dev/src/java/org/apache/lucene/index/IndexWriter.java
     (revision 327185)
+++ 
D:/programming/projects/componence/lucene-dev/src/java/org/apache/lucene/index/IndexWriter.java
     (working copy)
@@ -519,17 +519,21 @@
   }
 
   /** Merges all segments from an array of indexes into this index.
-   *
-   * <p>This may be used to parallelize batch indexing.  A large document
-   * collection can be broken into sub-collections.  Each sub-collection can be
-   * indexed in parallel, on a different thread, process or machine.  The
-   * complete index can then be created by merging sub-collection indexes
-   * with this method.
-   *
-   * <p>After this completes, the index is optimized. */
-  public synchronized void addIndexes(Directory[] dirs)
+  *
+  * <p>This may be used to parallelize batch indexing.  A large document
+  * collection can be broken into sub-collections.  Each sub-collection can be
+  * indexed in parallel, on a different thread, process or machine.  The
+  * complete index can then be created by merging sub-collection indexes
+  * with this method. 
+  * 
+  * <p>Also optionally index can be optimized before and 
+  * after adding new data.
+  */
+  public synchronized void addIndexes(Directory[] dirs,boolean optimize)
       throws IOException {
-    optimize();                                          // start with zero or 
1 seg
+    if (optimize) {
+      optimize();                                        // start with zero or 
1 seg
+    }
 
     int start = segmentInfos.size();
 
@@ -550,7 +554,25 @@
       }
     }
 
-    optimize();                                          // final cleanup
+    if (optimize) {
+      optimize();                                        // final cleanup
+    } else {
+      maybeMergeSegments();
+    }
+  }
+  
+  /** Merges all segments from an array of indexes into this index.
+  *
+  * <p>This may be used to parallelize batch indexing.  A large document
+  * collection can be broken into sub-collections.  Each sub-collection can be
+  * indexed in parallel, on a different thread, process or machine.  The
+  * complete index can then be created by merging sub-collection indexes
+  * with this method.
+  *
+  * <p>After this completes, the index is optimized. */
+  public synchronized void addIndexes(Directory[] dirs)
+  throws IOException {
+    addIndexes(dirs,false);
   }
 
   /** Merges the provided indexes into this index.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to