On Aug 6, 2009, at 2:31 PM, Paul Elschot wrote:
With a single search one might end up collecting lots of span info
that will be thrown away because the document score is too low.
Presumably, you would only collect it if the result was actually put
onto the PriorityQueue, in other words, after scoring that particular
doc, so you would only be keeping Span values for the number of
results requested. I'd be willing to trade off that memory, I think,
versus having to go iterate/skip all over Spans again.
So I think the best way is to first collect the best hits in the usual
way, and then get the spans of the query (effectively once more,
but now without SpanScorer in between) with the doc numbers
of the best hits as a filter while collecting all the begin/end
positions.
Yes, that is what I've traditionally done, but it is convoluted to
associate it with a ranked list of docs.