[ 
https://issues.apache.org/jira/browse/LUCENE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932982#action_12932982
 ] 

Shai Erera commented on LUCENE-2755:
------------------------------------

Earwin, the way CMS currently handles the writer instance makes it entirely not 
thread-safe. If you e.g. pass different writers to merge(), the class member 
changes, and MTs will start merging other segments, and in the worse case 
attempt to merge segments of a different writer.

I myself thinks it's ok to have a MP and MS per writer, but I don't have too 
strong feelings for/against it - so if we want to allow this, we should fix CMS.

As for the other comments, I'll need to check more closely what IW does w/ 
those merges - as it checks all sorts of things (e.g. whether it's an optimize 
merge or not, see one of the latest bugs Mike resolved). So getting it entirely 
outside of IndexWriter and into MP/MS is risky - at least, I don't understand 
the code well enough (yet) to say whether it's doable at all and if we don't 
miss something.

> Some improvements to CMS
> ------------------------
>
>                 Key: LUCENE-2755
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2755
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> While running optimize on a large index, I've noticed several things that got 
> me to read CMS code more carefully, and find these issues:
> * CMS may hold onto a merge if maxMergeCount is hit. That results in the 
> MergeThreads taking merges from the IndexWriter until they are exhausted, and 
> only then that blocked merge will run. I think it's unnecessary that that 
> merge will be blocked.
> * CMS sorts merges by segments size, doc-based and not bytes-based. Since the 
> default MP is LogByteSizeMP, and I hardly believe people care about doc-based 
> size segments anymore, I think we should switch the default impl. There are 
> two ways to make it extensible, if we want:
> ** Have an overridable member/method in CMS that you can extend and override 
> - easy.
> ** Have OneMerge be comparable and let the MP determine the order (e.g. by 
> bytes, docs, calibrate deletes etc.). Better, but will need to tap into 
> several places in the code, so more risky and complicated.
> On the go, I'd like to add some documentation to CMS - it's not very easy to 
> read and follow.
> I'll work on a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to