[jira] [Commented] (LUCENE-7868) Use multiple threads to apply deletes and DV updates

Michael McCandless (JIRA) Wed, 07 Jun 2017 07:40:51 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040984#comment-16040984
 ]


Michael McCandless commented on LUCENE-7868:
--------------------------------------------

I ran a quick indexing performance test on an internal corpus using an older 
version of the patch, comparing CPU usage before:

!cpu-before.png|width=800!

to CPU usage with the patch:

!cpu-after.png|width=800!

I don't have the exact numbers, and I need to re-run on the latest patch, but I 
think it was ~50% indexing throughput improvement overall.  This is on 64-core 
box, 480 GB RAM (an i3.16xlarge EC2 instance).

The before chart doesn't drop to 100 (one CPU) while applying deletes because 
there are concurrent merges running.

(Those little spiky drops down to near 0 CPU usage are from GC; I was using the 
default parallel collector I think).

> Use multiple threads to apply deletes and DV updates
> ----------------------------------------------------
>
>                 Key: LUCENE-7868
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7868
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master (7.0)
>
>         Attachments: cpu-after.png, cpu-before.png, LUCENE-7868.patch
>
>
> Today, when users delete documents or apply doc values updates, IndexWriter 
> buffers them up into frozen packets and then eventually uses a single thread 
> (BufferedUpdatesStream.applyDeletesAndUpdates) to resolve delete/update terms 
> to docids.  This thread also holds IW's monitor lock, so it also blocks 
> refresh, merges starting/finishing, commits, etc.
> We have heavily optimized this part of Lucene over time, e.g. LUCENE-6161, 
> LUCENE-2897, LUCENE-2680, LUCENE-3342, but still, it's a single thread so it 
> can't use multiple CPU cores commonly available now.
> This doesn't affect append-only usage, but for update-heavy users (me!) this 
> can be a big bottleneck, and causes long stop-the-world hangs during indexing.
> I have an initial exploratory patch to make these lookups concurrent, without 
> holding IW's lock, so that when a new packet of deletes is pushed, which 
> happens when we flush a new segment, we immediately use that same indexing 
> thread to and resolve the deletions.
> This is analogous to when we made segment flushing concurrent (LUCENE-3023), 
> just for deletes and DV updates as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7868) Use multiple threads to apply deletes and DV updates

Reply via email to