[ 
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523621
 ] 

Ning Li commented on LUCENE-847:
--------------------------------

I include comments for both LUCENE-847 and LUCENE-870 here since they are 
closely related.

I like the stateless approach used for refactoring merge policy. But modeling 
concurrent merge (ConcurrentMergePolicyWrapper) as a MergePolicy seems to be 
inconsistent with the MergePolicy interface:
  1 As you pointed out, "the merge policy is no longer responsible for running 
the merges itself". MergePolicy.maybeMerge simply returns a merge 
specification. But ConcurrentMergePolicyWrapper.maybeMerge actually starts 
concurrent merge threads thus doing the merges.
  2 Related to 1, cascading is done in IndexWriter in non-concurrent case. But 
in concurrent case, cascading is also done in merge threads which are started 
by ConcurrentMergePolicyWrapper.maybeMerge.

MergePolicy.maybeMerge should continue to simply return a merge specification. 
(BTW, should we rename this maybeMerge to, say, findCandidateMerges?) Can we 
carve the merge process out of IndexWriter into a Merger? IndexWriter still 
provides the building blocks - merge(OneMerge), mergeInit(OneMerge), etc. 
Merger uses these building blocks. A ConcurrentMerger extends Merger but starts 
concurrent merge threads as ConcurrentMergePolicyWrapper does.


Other comments:
1 UpdateDocument's and deleteDocument's bufferDeleteTerm are synchronized on 
different variables in this patch. However, the semantics of updateDocument 
changed since LUCENE-843. Before LUCENE-843, updateDocument, which is a delete 
and an insert, guaranteed the delete and the insert are committed together 
(thus an update). Now it's possible that they are committed in different 
transactions. If we consider DocumentsWriter as the RAM staging area for 
IndexWriter, then deletes are also buffered in RAM staging area and we can 
restore our previous semantics, right?

2 OneMerge.segments seems to rely on its segment infos' reference to segment 
infos of IndexWriter.segmentInfos. The use in commitMerge, which calls 
ensureContiguousMerge, is an example. However, segmentInfos can be a cloned 
copy because of exceptions, thus the reference broken.

3 Calling optimize of an IndexWriter with the current 
ConcurrentMergePolicyWrapper may cause deadlock: the one merge spec returned by 
MergePolicy.optimize may be in conflict with a concurrent merge (the same merge 
spec will be returned without changes to segmentInfos), but a concurrent merge 
cannot finish because optimize is holding the lock.

4 Finally, a couple of minor things:
  1 LogMergePolicy.useCompoundFile(SegmentInfos infos, SegmentInfo info) and 
useCompoundDocStore(SegmentInfos infos): why the parameters?
  2 Do we need doMergeClose in IndexWriter? Can we simply close a MergePolicy 
if not null?

> Factor merge policy out of IndexWriter
> --------------------------------------
>
>                 Key: LUCENE-847
>                 URL: https://issues.apache.org/jira/browse/LUCENE-847
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Steven Parkes
>            Assignee: Steven Parkes
>             Fix For: 2.3
>
>         Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, 
> LUCENE-847.patch.txt, LUCENE-847.take3.patch, LUCENE-847.take4.patch, 
> LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable, 
> making it possible for apps to choose a custom merge policy and for easier 
> experimenting with merge policy variants.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to