[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

Ning Li (JIRA) Thu, 30 Aug 2007 15:34:51 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523957
 ]


Ning Li commented on LUCENE-847:
--------------------------------

> True, but I was thinking CMPW could be an exception to this rule.  I
> guess I would change the rule to "simple merge policies don't have to
> run their own merges"?

:) Let's see if we have to make that exception.

> Good point...  I think I could refactor this so that cascading logic
> lives entirely in one place IndexWriter.

Another problem of the current cascading in CMPW.MergeThread is, if multiple 
candidate merges are found, all of them are added to 
IndexWriter.mergingSegments. But all but the first should be removed because 
only the first merge is carried out (thus removed from mergeSegments after the 
merge is done).

How do you make cascading live entirely in IndexWriter? Just removing cascading 
from CMPW.MergeThread has one drawback. For example, segment sizes of an index 
are: 40, 20, 10, buffer size is 10 and merge factor is 2. A buffer full flush 
of 10 will trigger merge of 10 & 10, then cascade to 20 & 20, then cascade to 
40 & 40. CMPW without cascading will stop after 10 & 10 since 
IndexWriter.maybeMerge has already returned. Then we have to wait for the next 
flush to merge 20 & 20.

> How would this be used?  Ie, how would one make an IndexWriter that
> uses the ConcurrentMerger?  Would we add expert methods
> IndexWriter.set/getIndexMerger(...)?  (And presumably the mergePolicy
> is now owned by IndexMerger so it would have the
> set/getMergePolicy(...)?)
> 
> Also, how would you separate what remains in IW vs what would be in
> IndexMerger?

Maybe Merger does and only does merge (so IndexWriter still owns MergePolicy)? 
Say, base class Merger.merge simply calls IndexWriter.merge. 
ConcurrentMerger.merge creates a merge thread if possible. Otherwise it calls 
super.merge, which does non-concurrent merge. IndexWriter simply calls its 
merger's merge instead of its own merge. Everything else remains in IndexWriter.


1
> Hmm ... you're right.  This is a separate issue from merge policy,
> right?  Are you proposing buffering deletes in DocumentsWriter
> instead?

Yes, this is a separate issue. And yes if we consider DocumentsWriter as 
staging area.

2
> Good catch!  How to fix?  One thing we could do is always use
> SegmentInfo.reset(...) and never swap in clones at the SegmentInfo
> level.  This way using the default 'equals' (same instance) would
> work.  Or we could establish identity (equals) of a SegmentInfo as
> checking if the directory plus segment name are equal?  I think I'd
> lean to the 2nd option....

I think the 2nd option is better.

3
> Hmmm yes.  In fact I think we can remove synchronized from optimize
> altogether since within it we are synchronizing(this) at the right
> places?  If more than one thread calls optimize at once, externally,
> it is actually OK: they will each pick a merge that's viable
> (concurrently) and will run the merge, else return once there is no
> more concurrency left.  I'll add a unit test that confirms this.

That seems to be the case. The fact that "the same merge spec will be returned 
without changes to segmentInfos" reminds me: MergePolicy.findCandidateMerges 
finds merges which may not be eligible; but CMPW checks for eligibility when 
looking for candidate merges. Maybe we should unify the behaviour? BTW, 
MergePolicy.optimize (a rename?) doesn't check for eligibility either.

4
> Well, useCompoundFile(...) is given a single newly flushed segment and
> should decide whether it should be CFS.  Whereas
> useCompoundDocStore(...) is called when doc stores are flushed.  When
> autoCommit=false, segments can share a single set of doc stores, so
> there's no single SegmentInfo to pass down.

The reason I asked is because none of them are used right now. So they might be 
used in the future?

> Factor merge policy out of IndexWriter
> --------------------------------------
>
>                 Key: LUCENE-847
>                 URL: https://issues.apache.org/jira/browse/LUCENE-847
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Steven Parkes
>            Assignee: Steven Parkes
>             Fix For: 2.3
>
>         Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, 
> LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, 
> LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, 
> making it possible for apps to choose a custom merge policy and for easier 
> experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter

Reply via email to