[ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526305
 ] 

Michael McCandless commented on LUCENE-847:
-------------------------------------------


> In the while loop of optimize(), LogMergePolicy.findMergesForOptimize
> returns a merge spec with one merge. If ConcurrentMergeScheduler is
> used, the one merge will be started in MergeScheduler.merge() and
> findMergesForOptimize will be called again. Before the one merge
> finishes, findMergesForOptimize will return the same spec but the
> one merge is already started. So only one concurrent merge is
> possible and the main thread will spin on calling
> findMergesForOptimize and attempting to merge.

Yes.  The new patch has cleaned this up nicely, I think.

> One possible solution is to make LogMergePolicy.findMergesForOptimize
> return multiple merge candidates. It allows higher level of
> concurrency.

Good idea!  I took exactly this approach in patch I just attached.  I
made a simple change: LogMergePolicy.findMergesForOptimize first
checks if "normal merging" would want to do merges and returns them if
so.  Since "normal merging" exposes concurrent merges, this gains us
concurrency for optimize in cases where the index has too many
segments.  I wasn't sure how otherwise to expose concurrency...

> It also alleviates a bit the problem of main thread spinning. To
> solve this problem, maybe we can check if a merge is actually
> started, then sleep briefly if not (which means all merges
> candidates are in conflict)?

This is much cleaner in new patch: there is no more spinning.  In new
patch if multiple threads are merging (either spawned by
ConcurrentMergeaScheduler or provided by the application or both) then
they all pull from a shared queue of "merges needing to run" and then
return when that queue is empty.  So no more spinning.

> One difference between the current approach on concurrent merge and
> the patch I posted a while back is that, in the current approach, a
> MergeThread object is created and started for every concurrent
> merge. In my old patch, maxThreadCount of threads are created and
> started at the beginning and are used throughout. Both have pros and
> cons.

Yeah I thought I would keep it simple (launch thread when needed then
let it finish when it's done) rather than use a pool.  This way
threads are only created (and are only alive) while concurrency is
actually needed (ie > N merges necessary at once).  But yes there are
pros/cons either way.


> Factor merge policy out of IndexWriter
> --------------------------------------
>
>                 Key: LUCENE-847
>                 URL: https://issues.apache.org/jira/browse/LUCENE-847
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Steven Parkes
>            Assignee: Steven Parkes
>             Fix For: 2.3
>
>         Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, 
> LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, 
> LUCENE-847.take5.patch, LUCENE-847.take6.patch, LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, 
> making it possible for apps to choose a custom merge policy and for easier 
> experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to