[
https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518184
]
Michael McCandless commented on LUCENE-847:
-------------------------------------------
Some more feedback:
- Is the separate IndexMerger interface really necessary? Can't we
just use IndexWriter directly? It's somewhat awkward/forced now,
because the interface has getters for ramBufferSizeMB and
maxBufferedDocs that are really a "writer" (flushing) thing not a
"merging" thing.
While LogDocMergePolicy does need "maxBufferedDocs", I think,
instead, in IndexWriter's "setMaxBufferedDocs()" it should "write
through" to the LogDocMergePolicy if that is the merge policy in
use (and LogDocMergePolicy should store its own private
"minMergeDocs").
I think the three getters may not even be needed (based on
comments below), in which case it seems like we shouldn't be
creating a new interface.
- In LogDocMergePolicy, it seems like the checking that's done for
whether a SegmentInfo is in a different directory, as well as the
subsequent copy to move it over to the IndexWriter's directory,
should not live in the MergePolicy?
Otherwise, every MergePolicy is going to have to duplicate this
code. Not to mention, we may someday create a more efficient way
to copy than running a single-segment merge (which is a very
inefficient, but, we only do it in addIndexes* I think). I'd like
to capture this in one place (IndexWriter).
EG, the writer could have its own "copyExternalSegments" method
which is called in addIndexes* and also optimize(), after the
merge policy has done its merge, which does the check for wrong
directory and subsequent copy.
I think this would mean IndexMerger.getDirectory() is not really
needed.
- The "checkOptimize" method in LogDocMergePolicy seems like it
belongs back in IndexWriter: I think we don't want every
MergePolicy having to replicate that tricky while condition.
Maybe we could change MergePolicy.merge to take a boolean "forced"
which, if true, means that the MergePolicy must now pick a merge
even if it didn't think it was time to. This would let us move
that tricky while condition logic back into IndexWriter.
Also, I think at some point we may want to have an optimize()
method that takes an int parameter stating the max # segments in
the resulting index. This would allow you to optimize down to <=
N segments w/o paying full cost of a complete "only one segment"
optimize. If we had a "forced" boolean then we could build such
an optimize method in the future, whereas as it stands now it
would not be so easy to retrofit previously created MergePolicy
classes to do this.
And some more minor feedback:
- I love the simplification of addIndexesNoOptimize :) Though (same
comment as above) I think we should move that final "copy if
different directory" step back in IndexWriter.
- There are some minor things that should not be committed eg the
added "infoStream = null" in IndexFileDeleter. I typically try to
put a comment "// nocommit" above such changes as I make them to
remind myself and keep them from slipping in.
- In the deprecated IndexWriter methods you're doing a hard cast to
LogDocMergePolicy which gives a bad result if you're using a
different merge policy; maybe catch the class cast exception (or,
better, check upfront if it's an instanceof) and raise a more
reasonable exception if not?
- IndexWriter around line 1908 has for loop that has commented out
"System.err.println"; we should just comment out/remove for loop
- These commented out synchronized spook me a bit:
/* synchronized(segmentInfos) */ {
Are they needed in these contexts? Is this only once we have
concurrent merging? (This ties back to the first feedback to
simplify MergePolicy API so that this kind of locking is only
needed inside IndexWriter).
- Can we support non-contiguous merges? If I understand it
correctly, the MergeSpecification can express such a merge, it's
just that the current IndexMerger (IndexWriter) cannot execute it,
right? So we are at least not precluding fixing this in the
future.
- It confuses me that MergePolicy has a method "merge(...)" -- can
we rename it to "maybeMerge(..)" or "checkForMerge(...)"? Ie,
this method determines whether a merge is necessary and, if so, it
then asks IndexMerger to in fact execute the merge (or, returns
the spec)?
- Instead of IndexWriter.releaseMergePolicy() can we have
IndexWriter only close the merge policy if it was the one that had
created it? (Similar to how IndexWriter closes the dir if it has
opened it from a String or File, but does not close it if it was
passed in).
> Factor merge policy out of IndexWriter
> --------------------------------------
>
> Key: LUCENE-847
> URL: https://issues.apache.org/jira/browse/LUCENE-847
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Steven Parkes
> Assignee: Steven Parkes
> Attachments: concurrentMerge.patch, LUCENE-847.patch.txt,
> LUCENE-847.txt
>
>
> If we factor the merge policy out of IndexWriter, we can make it pluggable,
> making it possible for apps to choose a custom merge policy and for easier
> experimenting with merge policy variants.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]