One of our indexes is updated completely quite frequently -> "batch update" or
"re-index".
If so more than 2million documents are added/updated to/in the very index. This
creates an immense IO load on our system. Does it make sense to set merge
scheduler to NoMergeScheduler (and/or MergePolicy to NoMergePolicy). Or is
merging "not relevant" as the commit is done at the very end only?
Context information:
At the moment the writer's config consists only of setRAMBufferSizeMB:
IndexWriterConfig config = new IndexWriterConfig(
IndexManager.CURRENT_LUCENE_VERSION, analyzer );
config.setMergePolicy( NoMergePolicy.NO_COMPOUND_FILES );
//config.setMergeScheduler( NoMergeScheduler.INSTANCE );
config.setRAMBufferSizeMB( 20 );
The update logic is as follows:
indexWriter.deleteAll()
...
for all elements do {
...
indexWriter.updateDocument( term, doc ); // in order to omit "duplicate entries"
...
}
indexWriter.commit
What is the proposed way to perform such a batch update?