[
https://issues.apache.org/jira/browse/LUCENE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607282#comment-13607282
]
Shai Erera commented on LUCENE-4752:
------------------------------------
What in the patch guarantees that any segment with more than maxBufferedDocs is
sorted? Perhaps I've missed it, but I looked for code which ensures every such
segment gets picked up by SortingMP, however didn't find it.
I don't think that in general we should make assumptions based on a
maxBufferedDocs setting because the default setting in IWC is per RAM
consumption and also it seems slightly unrelated. I.e. if a segment is sorted,
but has deletions such that numDocs < maxBufferedDocs, we do full collection,
while we can early terminate as usual?
EarlyTerminatingCollector, I think, need not have getFullCollector. Rather it
should wrap any other Collector (not limited to top doc) and if it detects a
sorted segment in setNextReader (we still need to figure out how to detect
that), early terminate after enough docs were seen, otherwise keep on calling
in.collect()? It's the app's responsibility to wrap its collector (which could
be ChainingCollector too) with this collector, and make sure that its early
termination logic fits with its collectors. And so I don't think we need
EarlyTerminationTopDocsCollector, but rather a concrete
EarlyTerminatingCollector. BTW, EarlyTerminationTopDocsCollector has an
uninitialized and unused maxUnsortedSize?
And hopefully we can stuff the early termination logic down to IndexSearcher
eventually. There are other scenarios for early termination, such as time
limit, and therefore I think it's ok if we have an EarlyTerminationException
which IndexSearcher responds to.
Adrien, perhaps in order to keep the patch small, commit separately the changes
to LTC and TestDuelingCodec (as well as the SortingAtomicReader.wrap change)?
These are good changes to commit anyway, and they only bloat out the patch and
mask the actual issue's development? Is it possible?
> Merge segments to sort them
> ---------------------------
>
> Key: LUCENE-4752
> URL: https://issues.apache.org/jira/browse/LUCENE-4752
> Project: Lucene - Core
> Issue Type: New Feature
> Components: core/index
> Reporter: David Smiley
> Assignee: Adrien Grand
> Attachments: LUCENE-4752.patch, LUCENE-4752.patch, LUCENE-4752.patch,
> LUCENE-4752.patch, LUCENE-4752.patch, LUCENE-4752.patch,
> natural_10M_ingestion.log, sorting_10M_ingestion.log
>
>
> It would be awesome if Lucene could write the documents out in a segment
> based on a configurable order. This of course applies to merging segments
> to. The benefit is increased locality on disk of documents that are likely to
> be accessed together. This often applies to documents near each other in
> time, but also spatially.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]