Hi

I can't give an exact answer to your question but my experience has
been that it's best to leave all the merge/buffer/etc settings alone.
If you are doing a bulk update of a large number of docs then it's no
surprise that you are seeing a heavy IO load.  If you can, it's likely
to be worth giving lucene a dedicated disk or at least make sure
there's as little contention as possible - that's just general advice
for any workload.  There is always going to a limiting factor
somewhere.

You could also experiment with multiple threads, or multiple jobs
writing to separate indexes with a standalone merge at the end.  In my
experience these have generally been more trouble than they're worth,
but the occasions when I do bulk loads of large number of docs are
sufficiently rare that I'm not too bothered how long it takes.


--
Ian.



--
Ian.


On Mon, Dec 22, 2014 at 9:45 AM, Clemens Wyss DEV <clemens...@mysign.ch> wrote:
> One of our indexes is updated completely quite frequently -> "batch update" 
> or "re-index".
> If so more than 2million documents are added/updated to/in the very index. 
> This creates an immense IO load on our system. Does it make sense to set 
> merge scheduler to NoMergeScheduler (and/or MergePolicy to NoMergePolicy). Or 
> is merging "not relevant" as the commit is done at the very end only?
>
> Context information:
> At the moment the writer's config consists only of setRAMBufferSizeMB:
> IndexWriterConfig config = new IndexWriterConfig( 
> IndexManager.CURRENT_LUCENE_VERSION, analyzer );
> config.setMergePolicy( NoMergePolicy.NO_COMPOUND_FILES );
> //config.setMergeScheduler( NoMergeScheduler.INSTANCE );
> config.setRAMBufferSizeMB( 20 );
>
> The update logic is as follows:
> indexWriter.deleteAll()
> ...
> for all elements do {
> ...
> indexWriter.updateDocument( term, doc ); // in order to omit "duplicate 
> entries"
> ...
> }
> indexWriter.commit
>
> What is the proposed way to perform such a batch update?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to