[
https://issues.apache.org/jira/browse/LUCENE-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592289#comment-13592289
]
Shai Erera commented on LUCENE-4752:
------------------------------------
How can you early terminate a query for a single segment? Say that you have 3
sorted segments (individually) and your query asks to get the top-10 of some
criteria. The top-10 may come from the 3 segments as follows: seg1=4, seg2=4,
seg3=2. But you don't know that until you processed all 3 segments right? While
you could make a decision on a per-segment basis to 'terminate', there's no
mechanism today to tell IndexSearcher "I'm done w/ that segment, move on".
Today, if you want to early terminate, you need to throw an exception from the
Collector, and catch it outside, in your application code?
To early terminate efficiently, you must have the segments in a consistent
order, e.g. S1 > S2 > S3. Then, after you've processed enough elements from S1,
you can early terminate the entire query because you're guaranteed that
successive documents will be "smaller".
Unless, you add to your Collector.collect() an "if (done) return" and consider
that a no-op, or make your own IndexSearcher logic ... then per-segment early
termination is doable.
As for the approach you describe, I think that instead of stuffing into IWC
what seems like a random setting (pick-segments-for-sorting), we should have
something more generic, like AtomicReaderFactory, which IW will use instead of
always loading SegmentReader. That will let you load your custom AtomicReader?
Or, perhaps this can be a SortingCodec? Also, a custom SegmentMerger to
implement the zig-zag merge would help too.
> Merge segments to sort them
> ---------------------------
>
> Key: LUCENE-4752
> URL: https://issues.apache.org/jira/browse/LUCENE-4752
> Project: Lucene - Core
> Issue Type: New Feature
> Components: core/index
> Reporter: David Smiley
> Assignee: Adrien Grand
>
> It would be awesome if Lucene could write the documents out in a segment
> based on a configurable order. This of course applies to merging segments
> to. The benefit is increased locality on disk of documents that are likely to
> be accessed together. This often applies to documents near each other in
> time, but also spatially.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]