On Tue, Mar 24, 2009 at 02:47:07PM +0200, Shai Erera wrote:

> I agree about the unnecessary method call - we should make a collector's
> implementation as efficient as possible.

Maybe it makes sense to just bite the bullet and duplicate the unrolled code?

There's precedent: ScorerDocQueue is not a subclass of PriorityQueue.

> But what about cases like collectors chaining, extensions and running w/
> several collectors? If each collector will need to request for the
> document's score, it might be computed over and over again. Consider a case
> for example, of a TopScoreDocCollector, running together w/ another
> collector that extracts information about the document, and uses the score
> to compute something (easy to write a collector which delegates the
> collect() call to each of them). Today, I could just call collect(doc,
> score) on each collector. But in the proposed way, I'd call collect(doc) and
> then each of them will need to request the score.

In such a case, perhaps it would be possible to supply a trivial Scorer
wrapper subclass which caches a score.  Then you still have the overhead of
the method calls, but not the overhead of calculating the score.

That's not ideal, but I think the case of matching-without-scoring is more
important to optimize for.

> Perhaps we can introduce a collect(doc) on HitCollector which does not
> accept score, but keep the other collect? I am not sure if that's any
> better, because then the Lucene search code would need to decide to which
> collect method to call ...

Also, passing arguments is dirt cheap.  A HitCollector that only cares about
adding doc nums to a BitVector can just ignore the second argument.

> Regarding the TopDocs interface (we should have a better name as TopDocs is
> already taken), 

"Winners".

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to