[
https://issues.apache.org/jira/browse/LUCENE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931468#action_12931468
]
Shai Erera commented on LUCENE-2755:
------------------------------------
Ok, so not calling IndexWriter.getNextMerge() before we know we can register
that merge is problematic. The reason is we want to know if there is a next
merge before we check if it can be registered. If not, the method returns
immediately. Otherwise, we'll wait until any merge can be registered, just to
discover there are no more merge.
So one solution can be to add to IW a hasMerges() and in CMS wait for room to
become available only if there are merges.
Another solution is to do a larger change to CMS and introduce an
ExecutorService - this has been raised in the past, so perhaps it's time to
finally do it? By using a blocking queue, we don't need to implement any
waiting logic - Java will do it for us.
The downside of that is that I'm not sure we can control which of the merges
runs and which isn't. Perhaps we can hack this through - I'll need to start the
process to tell for sure. This feature is important - today CMS guarantees the
smaller merges run first - so it might be that a larger merge was registered
before a smaller merge, and we'd still want to execute the smaller one before
the larger.
A third solution would be to not do anything and keep things as they are -
namely let some merge be held by CMS until it can be executed.
Just summarizing my thoughts for now.
> Some improvements to CMS
> ------------------------
>
> Key: LUCENE-2755
> URL: https://issues.apache.org/jira/browse/LUCENE-2755
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Shai Erera
> Assignee: Shai Erera
> Priority: Minor
> Fix For: 3.1, 4.0
>
>
> While running optimize on a large index, I've noticed several things that got
> me to read CMS code more carefully, and find these issues:
> * CMS may hold onto a merge if maxMergeCount is hit. That results in the
> MergeThreads taking merges from the IndexWriter until they are exhausted, and
> only then that blocked merge will run. I think it's unnecessary that that
> merge will be blocked.
> * CMS sorts merges by segments size, doc-based and not bytes-based. Since the
> default MP is LogByteSizeMP, and I hardly believe people care about doc-based
> size segments anymore, I think we should switch the default impl. There are
> two ways to make it extensible, if we want:
> ** Have an overridable member/method in CMS that you can extend and override
> - easy.
> ** Have OneMerge be comparable and let the MP determine the order (e.g. by
> bytes, docs, calibrate deletes etc.). Better, but will need to tap into
> several places in the code, so more risky and complicated.
> On the go, I'd like to add some documentation to CMS - it's not very easy to
> read and follow.
> I'll work on a patch.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]