>
> I may have lost the context here... but here's what I thought we were
> talking about.
>
> If we choose the interface option (adding a "ProvidesTopDocsResults"
> (say) interface), then you would create method
> renderResults(ProvidesTopDocResults).
>
> Then, any collector implementing that interface could be safely passed
> in, as the upcast is done at compile time not runtime.
>

Consider this code snippet:

HitCollector hc = condition? new TopDocCollector() : TopFieldDocCollector();
searcher.search(hc);

The problem is that I need a base class for both collectors. If I use the
interface ProvidesTopDocsResults, then I cannot pass it to searcher, w/o
casting to HitCollector. If I use HitCollector, then I need to cast it
before passing it into rederResults(). Only when both class have the same
base class which is also a HitCollector, I don't need any casting. I.e.,
after I submit a patch that develops what we've agreed on, the code can look
like this:

TopResultsCollector trc = condition ? new TopScoreDocCollector() : new
TopFieldCollector();
searcher.search(trc);
renderResults(trc);

Here I can pass 'trc' to both methods since it both a HitCollector and a
TopResultsCollector. That's what I was missing in your proposal.

Shai do you want to take a first cut at making a patch?  Can you open
> an issue?  Thanks.


I can certainly do that. I think the summary of the steps make sense. I'll
check if TopScoreDocCollector and TopFieldCollector can also extend that
"you provide your own pqueue and we collect top N according to it"
collector, passing a null PQ and extending topDocs().

I also would like to propose an additional method to topDocs(), topDocs(int
start, int howMany) which will be more efficient to call in case of paging
through results. The reason is that topDocs() pops everything from the PQ,
then allocates a ScoreDoc[] array of size of number of results and returns a
TopDocs object. You can then choose just the ones you need.
On the other hand, topDocs(start, howMany) would allocate exactly the size
of array needed. E.g., in case someone pages through results 91-100, you
allocate an array of size 10, instead of 100.
It is not a huge improvement, but it does save some allocations, as well as
it's a convenient method.

BTW, I like the name ResultsCollector, as it's just like HitCollector, but
does not commit too much to "hits" .. i.e., facets aren't hits ... I think?

Shai

On Thu, Mar 26, 2009 at 12:55 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Shai Erera <ser...@gmail.com> wrote:
>
> >> The difference is for the new code, it's an upcast, which catches any
> >> errors at compile time, not run time.  The compiler determines that
> >> the class implements the required interface.
> >
> > I still don't understand how a compiler can detect at compilation time
> that
> > a HitCollector instance that is used in method A, and is casted to a
> > TopDocsOutput instance by calling method B (from A) is actually ok ... I
> may
> > be missing some Java information here, but I simply don't see how that
> > happens at compilation time instead of at runtime ...
>
> I may have lost the context here... but here's what I thought we were
> talking about.
>
> If we choose the interface option (adding a "ProvidesTopDocsResults"
> (say) interface), then you would create method
> renderResults(ProvidesTopDocResults).
>
> Then, any collector implementing that interface could be safely passed
> in, as the upcast is done at compile time not runtime.
>
> > So in fact the internal Lucene code expects only MRHC from a certain
> point,
> > and so even if I wrote a HC and passed it on Searcher, it's still
> converted
> > to MRHC, with an empty setNextReader() method implementation. That's why
> I
> > meant that HC is already deprecated, whether it's marked as deprecated or
> > not.
>
> The setNextReader() impl is not empty; it does the re-basing of docID
> on behalf of the HC.
>
> > What you say about deprecating HC to me is unnecessary. Simply pull
> > setNextReader up with an empty implementation, get rid of all the
> > instanceof, casting and wrapping code and you're fine. Nothing is broken.
> > All works well and better (instanceof, casting and wrapping have their
> > overheads).
> > Isn't that right?
>
> I think we need to deprecate HC, in favor of MRHC (or if we can think
> of a better name... ResultCollector?).
>
> > Regarding interfaces .. I don't think they're that bad. Perhaps a
> different
> > viewing angle might help. Lucene processes a query and fires events.
> Today
> > it fires an event every time a doc's score has been computed, and
> recently
> > every time it starts to process a different reader. HitCollector is a
> > listener implementation on the "doc-score event", while MRHC is a
> listener
> > on both.
> > To me, interfaces play just nicely here. Assume that you have the
> following
> > interfaces:
> > - DocScoreEvent
> > - ChangeReaderEvent
> > - EndProcessingEvent (thrown when the query has finished processing -
> > perhaps to aid collectors to free resources)
> > - <any other events you foresee?>
> > The Lucene code receives a HitCollector which listens on all events. In
> the
> > future, Lucene might throw other events, but still get a HitCollector.
> Those
> > methods will check for instanceof, and you as a user will know that if
> you
> > want to catch those events, you pass in a collector implementation which
> > does. Those events cannot of course be main-stream events (like
> > DocScoreEvent), but new ones, perhaps even experimental.
> > Since HitCollector is a concrete class, we can always add new interfaces
> to
> > it in the future with empty implementations?
>
> I agree interfaces clearly have useful properties, but their achilles
> heel for Lucene in all but the most trivial needs is the
> non-back-compatibility when you want to add a method to the interface.
> There have been a number of java-dev discussons on this problem.
>
> So, I think something like this:
>
>  * Deprecate HitCollector in favor of MultiReaderHitCollector (any
>    better name here?) abstract class.  If you want to make a fully
>    custom HitCollector, you subclass this calss.
>
>    Let's change MRHC's collect to take only an [unbased] docID, and
>    expose a setScorer(Scorer) method.  Then if the collector needs
>    score it can call Scorer.score().
>
>  * Subclass that to create an abstract "tracks top N results
>    collector" (TopDocsCollector?  TopHitsCollector?)
>
>  * Subclass TopDocsCollector to a final, fast "top N sorted by score"
>    collector (exists already: TopScoreDocCollector)
>
>  * Subclass TopDocsCollector to a final, fast "top N sorted by field"
>    collector (exists already: TopFieldCollector)
>
>  * Subclass TopDocsCollector to a "you provide your own pqueue and we
>    collect top N according to it" collector (does not yet exist --
>    name?).  This is the way forward for existing subclasses of
>    TopDocCollector.
>
> Shai do you want to take a first cut at making a patch?  Can you open
> an issue?  Thanks.
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

Reply via email to