[
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520293
]
Steven Parkes commented on LUCENE-847:
--------------------------------------
One new small item: you've added a "public void merge()" to
IndexWriter so that people can externally kick off a merge request,
which is good I think.
But, is it really necessary to flush here? It would be better to not
flush so that users then have two separate methods (flush() and
merge()) to do each function independently.
That makes sense.
Note that merge() was added not for users (which I have no strong opinion
about) but so that, potentially, CMP can check again for merges when a set of
merge threads completes, i.e., cascade.
I think instead we should leave the methods, not deprecated, as
convenience (sugar) methods. Simple things should be simple; complex
things should be possible.
I think this argues for a LegacyMergePolicy interface again, then? If we change
the default merge policy and someone changes their code to use LogDoc for their
own purposes, in both cases the getters/setters should work? So cast to the
interface and as long as the merge policy supports this, the getters/setters
work (unless the merge policy decides to throw within), otherwise the
getters/setters throw?
Uh, no: when someone calls optimize that means it really must be done,
right? So "optimize" is the right name I think.
Yeah, but it might do nothing. Just as merge might do nothing.
Can you factor this out, eg add a private method
"getLogDocMergePolicy(String reason)"
Sure.
Looks good, thanks. Can you add javadocs (w/ params) for both of
these new methods?
Sure.
Though, this raises the tricky question of index
consistency ...
Definitely. I'm still trying to understand all the subtleties here.
IndexWriter commits the new segments file right after
mergePolicy.merge returns ... so for CMP we suddenly have an unusable
index (as seen by an IndexReader).
How so? I figured that after mergePolicy.merge returns, in the case of CMP with
an ongoing merge, segmentInfos won't have changed at all. Is that a problem?
I thought the issue would be on the other end, where the concurrent merge
finishes and needs to update segmentInfos.
Maybe it's too ambitious to allow merges of segments from other
directories to run concurrently?
Yeah, that might be the case. At least as a default?
I would consider it a hard error in IndexWriter if after calling
mergePolicy.merge from any of the addIndexes*, there remain segments
in other directories. I think we should catch this & throw an
exception?
It would be easy enough for CMP to block in this case, rather than returning
immediately. Wouldn't that be better? And I suppose it's possible to imagine an
API on CMP for specifying this behavior?
I opened a separate issue LUCENE-982 to track this.
I think this is good. I think it's an interesting issue but not directly
related to the refactor?
Although ... do you think we need need some way for merge policy to
state where the new segment should be inserted into SegmentInfos?
Right now I assumed it would replace the left most-segment.
Since I don't really know the details of what such a merge policy would like, I
don't really know what it needs.
If you've thought about this more, do you have a suggestion? I suppose we could
just add an int. But, then again, I'd do that as a separate function, leaving
the original available, so we can do this later, completely compatibly?
Hmmm, does CMP block on close while it joins to any running merge
threads?
Yeah, at least in my sandbox.
How can the user close IndexWriter and abort the running
merges? I guess CMP would provide a method to abort any running
merges, and user would first call that before calling
IndexWriter.close?
I hadn't really thought about this but I can see that should be made possible.
It's always safe to abandon a merge so it should be available, for fast, safe,
and clean shutdown.
True, LogDoc as it now stands would never exploit concurrency (it will
always return the highest level that needs merging). But, we could
relax that such that if ever the lowest level has > 2*mergeFactor
pending segments to merge then we select the 2nd set.
Okay. But it will always return that? Still doesn't sound concurrent?
The thing is, the serial merge policy has no concept of concurrent merges, so
if the API is always to select the best merge, until a pervious merge finishes,
it will always return that as the best merge.
Concurrent is going to require, by hook or by crook, that a merge policy be
able to generate a set of non-conflicting merges, is it not?
I think the LUCENE-845 merge policy does this now, given that CMP gathers up
the merge calls. I'm not sure the current LUCENE-847 merge policy does (I'd
have to double check) because it sometimes will try to use the result of the
current merge in the next merge. The LUCENE-845 merge doesn't try to do this
which is a(n) (inconsequential) change?
This is another benefit of the simplified API: MergePolicy.maybeMerge
would only be called with a lock already acquired (by IndexWriter) on
the segmentInfos.
Do you really mean a lock on segmentInfos or just the lock on IndexWriter? I'm
assuming the latter and I think this is the case for both API models.
I don't think it's feasible to have a lock on segmentInfos separately. Only
IndexWriter should change segmentInfos and no code should try to look at
segmentInfos w/o being called via an IW synch method.
This does imply that CMP has to copy any segmentInfos data it plans to use
during concurrent merging, since the IW lock is not held during these periods.
Then, when the merge is done, segmentInfos is updated in IndexWriter via a
synch call to IW#replace.
This means IW#segmentInfos can change while a merge is in progress and this has
to be accounted for. That's what I'm walking through now.
> Factor merge policy out of IndexWriter
> --------------------------------------
>
> Key: LUCENE-847
> URL: https://issues.apache.org/jira/browse/LUCENE-847
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Steven Parkes
> Assignee: Steven Parkes
> Attachments: concurrentMerge.patch, LUCENE-847.patch.txt,
> LUCENE-847.patch.txt, LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable,
> making it possible for apps to choose a custom merge policy and for easier
> experimenting with merge policy variants.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]