I think I'd lean towards a third solution: tighten up
TopScoreDocCollector (make it final, remove ability to change its PQ,
make things private) and have it focus on high performance collection
by score.  This is the default collector for Lucene searches so I think
keeping it high performance (not adding additional method calls nor
boolean checks) is important.  You choose this for performance

Then introduce a new hit collector that makes subclassing / changing
the PQ easier. You choose this one for flexibility.

And 2.9 is a great chance to do this since we've deprecated the old
class (TopDocCollector) in favor of TopScoreDocCollector.

Mike

Shai Erera wrote:

Thanks Chris/Hoss (not sure who sent the original reply).

I don't like calling pq.lessThan, as pq.insert and pq.insertWithOverflow call it anyway internally and since it would add a method call (something that was tried to be avoided in the current implementation), I prefer the code I proposed below.

BTW, I introduced 1356, so I take full responsibility on this overlooking. The main reason for 1356 was to allow creating extensions of TopDocCollector so they can be of the same type, and share the topDocs() and totalHIts() implementations.

I can file an issue. Any other comments?

Shai

On Sat, Mar 21, 2009 at 3:48 AM, Chris Hostetter <hossman_luc...@fucit.org > wrote:

(resending msg from earlier today during @apache mail outage -- i didn't get a copy from the list, so i'm assuming no one did)


---------- Forwarded message ----------
Date: Fri, 20 Mar 2009 15:29:13 -0700 (PDT)

: TopDocCollector's (TDC) implementation of collect() seems a bit problematic
: to me.

This code isn't an area i'm very familiar with, but your assessment seems
correct ... it looks like when LUCENE-1356 introduced the ability to
provide a PriorityQueue to the constructor, the existing optimization when
the score was obvoiusly too low was overlooked.

It looks like this same bug got propogated to TopScoreDocCollector
when it was introduced as well.

: Introduce in TDC a private boolean which signals whether the default PQ is

: used or not. If it's not used, don't do the 'else if' at all. If it is used,
: then the 'else if' is safe. Then code could look like:

my vote would just be to change the ">=" comarison to a hq.lessThan call ... but i can understand how your proposal might be more efficient -- I'll let the performance experts fight it out ... but i definitely think you
should fil a bug.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to