[jira] Commented: (LUCENE-2755) Some improvements to CMS

Earwin Burrfoot (JIRA) Mon, 15 Nov 2010 13:10:53 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932200#action_12932200
 ]


Earwin Burrfoot commented on LUCENE-2755:
-----------------------------------------

bq. There was some reason why this needed to be called by CMS but not SMS but I 
can't remember why.
That has something to do with assigning new segment names, if you believe the 
comments.
But IW.mergeInit does a freakload of other stuff! I think assigning names can 
happen in a separate place, before OneMerge is submitted to MS.

bq. Otherwise, when a laaarge merge is taking place, it causes to to fully stop 
your indexing threads unnecessarily
I still think this can be mitigated in more appropriate ways. Like allocating 
big enough pending merges queue to wait until the long one finishes.
Indexing threads push merges into the queue (with CMS) and don't block.
Plus to that, you can use nice policies like BalancedSegmentMergePolicy, that 
prevent UBER-merges from occuring at all.

bq. That's tempting... but people use MSs eg to schedule big merges at 
different times. I don't think we should outright drop MS.
That exact use case is totally wrong. MergePolicy decides which merges should 
run NOW, MergeScheduler executes them.
If a certain big merge should run only within some specific timeframe, 
MergePolicy should not return it when asked for eligible merges.

In your sample, when decision-making is smeared across classes, the merges 
created by MP and deferred by MS are stale when their
time comes. If asked now, MP would include some additional segments in the 
merge that MS stalled around for ages.

For a glance of things done relatively right, take a look at BSMP - it has 
setPartialExpunge method, that alters its
behaviour to include some expensive housecleaning. It is supposed you do 
setPartialExpunge(true) at the beginning of your quiet period, and
setPartialExpunge(false) when it ends.

bq. A good feature for Solr could be the ability to via an HTTP call kick-off 
pending large merges.  They could then be scheduled via a cron job and based on 
other factors, such as whether or not other indexing tasks are running.
Same argument here. The place for such decisions is MergePolicy.

> Some improvements to CMS
> ------------------------
>
>                 Key: LUCENE-2755
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2755
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> While running optimize on a large index, I've noticed several things that got 
> me to read CMS code more carefully, and find these issues:
> * CMS may hold onto a merge if maxMergeCount is hit. That results in the 
> MergeThreads taking merges from the IndexWriter until they are exhausted, and 
> only then that blocked merge will run. I think it's unnecessary that that 
> merge will be blocked.
> * CMS sorts merges by segments size, doc-based and not bytes-based. Since the 
> default MP is LogByteSizeMP, and I hardly believe people care about doc-based 
> size segments anymore, I think we should switch the default impl. There are 
> two ways to make it extensible, if we want:
> ** Have an overridable member/method in CMS that you can extend and override 
> - easy.
> ** Have OneMerge be comparable and let the MP determine the order (e.g. by 
> bytes, docs, calibrate deletes etc.). Better, but will need to tap into 
> several places in the code, so more risky and complicated.
> On the go, I'd like to add some documentation to CMS - it's not very easy to 
> read and follow.
> I'll work on a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2755) Some improvements to CMS

Reply via email to