Re: eliminating scoring for the sake of efficiency

Paul Elschot Thu, 11 May 2006 14:42:42 -0700

On Thursday 11 May 2006 22:42, Boris Galitsky wrote:
> Hello
> 
>     We don't need any scoring in our application domain, but 
> efficiency is the key because we are getting tens thousand of hits for 
> span queries; all these hits are necessary to collect.
>     Is there a simple way to turn scoring off while indexing, while 
> search  and while delivering document IDs to save on time?


You could use getSpans() on the top level SpanQuery, and use a loop
calling next() on the Spans, and ignore duplicate doc() values from the Spans
in that loop.
A counter in the loop would also give you the number of matching occurrences
of the SpanQuery.

This way of using the Spans directly should be slightly more efficient than
using a HitCollector, but don't hold your breath.

In case you have ordered SpanQuery's without overlaps, the
NearSpansOrdered here  might be a bit faster than the NearSpans
currently in Lucene:
http://issues.apache.org/jira/browse/LUCENE-413
(you'll also need the patch to SpanNearQuery).

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: eliminating scoring for the sake of efficiency

Reply via email to