On Mar 29, 2007, at 7:44 PM, Ning Li wrote:
If a query requires top-K results, isn't it sufficient to find top-K results in each segment and merge them to return the overall top-K results?
They are merged by collecting them into a HitQueue.
Early termination happens in finding top-K results in one segment. Assuming each document has a static score, document ids are assigned in the same order of their static scores within a segment. If a top-K query is scored by the same static score, query processing on a segment can stop as soon as the first K results are found.
Indeed, that's exactly how the loop in Scorer_collect() works.
As to the indexing side, applications should be able to pick such a static score? If Lucene score function is used, norm is a good candidate? (One tricky thing with norm is that it is updatable.)
I would argue that only a single mechanism based on indexed, non- tokenized fields should be used to determine sort order. Sort order based upon norms is easy for the user to fake using a dedicated field at a small cost, so library-level support is not needed.
Marvin Humphrey Rectangular Research http://www.rectangular.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]