IndexWriter in addIndexes(Directory[] dirs) method optimizes index
before and after operation.
Some notes about this:
1). Adding sub indexes to large index can take long because of double
optimization.
2). This breaks IndexWriter.maxMergeDocs logic, because optimize will
merge data into single segment index.
I suggest add new method with boolean parameter to optionally specify
whether index should be optimized.
There is similar method addIndexes(IndexReader[] readers) in IndexWriter
that takes array of IndexReaders but I don't know how it can be modified
to provide same optional functionality
Patch attached here to discuss it first
(should I post it directly to jira?)
--
regards,
Volodymyr Bychkoviak
Index:
D:/programming/projects/componence/lucene-dev/src/java/org/apache/lucene/index/IndexWriter.java
===================================================================
---
D:/programming/projects/componence/lucene-dev/src/java/org/apache/lucene/index/IndexWriter.java
(revision 327185)
+++
D:/programming/projects/componence/lucene-dev/src/java/org/apache/lucene/index/IndexWriter.java
(working copy)
@@ -519,17 +519,21 @@
}
/** Merges all segments from an array of indexes into this index.
- *
- * <p>This may be used to parallelize batch indexing. A large document
- * collection can be broken into sub-collections. Each sub-collection can be
- * indexed in parallel, on a different thread, process or machine. The
- * complete index can then be created by merging sub-collection indexes
- * with this method.
- *
- * <p>After this completes, the index is optimized. */
- public synchronized void addIndexes(Directory[] dirs)
+ *
+ * <p>This may be used to parallelize batch indexing. A large document
+ * collection can be broken into sub-collections. Each sub-collection can be
+ * indexed in parallel, on a different thread, process or machine. The
+ * complete index can then be created by merging sub-collection indexes
+ * with this method.
+ *
+ * <p>Also optionally index can be optimized before and
+ * after adding new data.
+ */
+ public synchronized void addIndexes(Directory[] dirs,boolean optimize)
throws IOException {
- optimize(); // start with zero or
1 seg
+ if (optimize) {
+ optimize(); // start with zero or
1 seg
+ }
int start = segmentInfos.size();
@@ -550,7 +554,25 @@
}
}
- optimize(); // final cleanup
+ if (optimize) {
+ optimize(); // final cleanup
+ } else {
+ maybeMergeSegments();
+ }
+ }
+
+ /** Merges all segments from an array of indexes into this index.
+ *
+ * <p>This may be used to parallelize batch indexing. A large document
+ * collection can be broken into sub-collections. Each sub-collection can be
+ * indexed in parallel, on a different thread, process or machine. The
+ * complete index can then be created by merging sub-collection indexes
+ * with this method.
+ *
+ * <p>After this completes, the index is optimized. */
+ public synchronized void addIndexes(Directory[] dirs)
+ throws IOException {
+ addIndexes(dirs,false);
}
/** Merges the provided indexes into this index.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]