Shai Erera <[email protected]> wrote:
>> The difference is for the new code, it's an upcast, which catches any
>> errors at compile time, not run time. The compiler determines that
>> the class implements the required interface.
>
> I still don't understand how a compiler can detect at compilation time that
> a HitCollector instance that is used in method A, and is casted to a
> TopDocsOutput instance by calling method B (from A) is actually ok ... I may
> be missing some Java information here, but I simply don't see how that
> happens at compilation time instead of at runtime ...
I may have lost the context here... but here's what I thought we were
talking about.
If we choose the interface option (adding a "ProvidesTopDocsResults"
(say) interface), then you would create method
renderResults(ProvidesTopDocResults).
Then, any collector implementing that interface could be safely passed
in, as the upcast is done at compile time not runtime.
> So in fact the internal Lucene code expects only MRHC from a certain point,
> and so even if I wrote a HC and passed it on Searcher, it's still converted
> to MRHC, with an empty setNextReader() method implementation. That's why I
> meant that HC is already deprecated, whether it's marked as deprecated or
> not.
The setNextReader() impl is not empty; it does the re-basing of docID
on behalf of the HC.
> What you say about deprecating HC to me is unnecessary. Simply pull
> setNextReader up with an empty implementation, get rid of all the
> instanceof, casting and wrapping code and you're fine. Nothing is broken.
> All works well and better (instanceof, casting and wrapping have their
> overheads).
> Isn't that right?
I think we need to deprecate HC, in favor of MRHC (or if we can think
of a better name... ResultCollector?).
> Regarding interfaces .. I don't think they're that bad. Perhaps a different
> viewing angle might help. Lucene processes a query and fires events. Today
> it fires an event every time a doc's score has been computed, and recently
> every time it starts to process a different reader. HitCollector is a
> listener implementation on the "doc-score event", while MRHC is a listener
> on both.
> To me, interfaces play just nicely here. Assume that you have the following
> interfaces:
> - DocScoreEvent
> - ChangeReaderEvent
> - EndProcessingEvent (thrown when the query has finished processing -
> perhaps to aid collectors to free resources)
> - <any other events you foresee?>
> The Lucene code receives a HitCollector which listens on all events. In the
> future, Lucene might throw other events, but still get a HitCollector. Those
> methods will check for instanceof, and you as a user will know that if you
> want to catch those events, you pass in a collector implementation which
> does. Those events cannot of course be main-stream events (like
> DocScoreEvent), but new ones, perhaps even experimental.
> Since HitCollector is a concrete class, we can always add new interfaces to
> it in the future with empty implementations?
I agree interfaces clearly have useful properties, but their achilles
heel for Lucene in all but the most trivial needs is the
non-back-compatibility when you want to add a method to the interface.
There have been a number of java-dev discussons on this problem.
So, I think something like this:
* Deprecate HitCollector in favor of MultiReaderHitCollector (any
better name here?) abstract class. If you want to make a fully
custom HitCollector, you subclass this calss.
Let's change MRHC's collect to take only an [unbased] docID, and
expose a setScorer(Scorer) method. Then if the collector needs
score it can call Scorer.score().
* Subclass that to create an abstract "tracks top N results
collector" (TopDocsCollector? TopHitsCollector?)
* Subclass TopDocsCollector to a final, fast "top N sorted by score"
collector (exists already: TopScoreDocCollector)
* Subclass TopDocsCollector to a final, fast "top N sorted by field"
collector (exists already: TopFieldCollector)
* Subclass TopDocsCollector to a "you provide your own pqueue and we
collect top N according to it" collector (does not yet exist --
name?). This is the way forward for existing subclasses of
TopDocCollector.
Shai do you want to take a first cut at making a patch? Can you open
an issue? Thanks.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]