On Mar 29, 2007, at 7:44 PM, Ning Li wrote:

If a query requires top-K results, isn't it
sufficient to find top-K results in each segment and merge them to
return the overall top-K results?

They are merged by collecting them into a HitQueue.

Early termination happens in
finding top-K results in one segment. Assuming each document has a
static score, document ids are assigned in the same order of their
static scores within a segment. If a top-K query is scored by the same
static score, query processing on a segment can stop as soon as the
first K results are found.

Indeed, that's exactly how the loop in Scorer_collect() works.

As to the indexing side, applications should be able to pick such a
static score? If Lucene score function is used, norm is a good
candidate? (One tricky thing with norm is that it is updatable.)

I would argue that only a single mechanism based on indexed, non- tokenized fields should be used to determine sort order. Sort order based upon norms is easy for the user to fake using a dedicated field at a small cost, so library-level support is not needed.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to