[ 
https://issues.apache.org/jira/browse/LUCENE-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694519#action_12694519
 ] 

Shai Erera commented on LUCENE-1575:
------------------------------------

bq. Though we now don't have a single shared ancestor class that can give you a 
maxScore() when sorting by score or when sorting by fields

I don't think it's a big issue. We have just two extensions to 
TopDocsCollector, each tracking maxScore differently. Perhaps later on if more 
extensions are created, or a demand comes from users, we add a 
ScoringTopDocsCollector class?

bq. Should we implement a default 
TopDocsCollector.collect/setScorer/setNextReader?

I actually think that TopDocsCollector gives you exactly what I had in mind. 
When I started it, there was only collect(doc, score) method, and I wanted to 
have a base class that will implement getTotalHits and topDocs for you, while 
you only need to provide an implementation to collect(). Now collect has really 
become setNextReader, setScorer and collect. Meaning, you know what you want to 
do, TDC just takes care of creating the TopDocs for you, and you can even 
override topDocs(start, howMany) if you want to return a different TopDocs 
instance.

I think now it's clean and simple.

bq. What happened to OnlyPositiveScoresFilter? (I don't see it).

Well ... you don't see it because I forgot to implement it :).
Funny thing - all the tests pass without it, meaning all that time, we were 
filtering <= 0 scored documents, but never really tested it ... I guess if it's 
because we never really believed it'll happen?

Anyway, I'll add such a class and a suitable test class (make sure it fails w/o 
using that wrapper first).

bq. Can you update javadocs eg something like "NOTE: you cannot call this 
method more than once" .

That has always been the case. Previously if you called topDocs() everything 
has been popped out of the queue for you. I understand what you say though, 
since we have the topDocs(start, howMany) API, nothing prevents a user from 
calling topDocs(0, 10) and topDocs(10, 10), only the second call will fail. 
However, I don't really think that's how people will use it ... and if they do, 
then perhaps they should just call topDocs() and do whatever they need on these 
ranges using the TopDocs object?
I'll add that to the documentation as well.

> Refactoring Lucene collectors (HitCollector and extensions)
> -----------------------------------------------------------
>
>                 Key: LUCENE-1575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Shai Erera
>             Fix For: 2.9
>
>         Attachments: LUCENE-1575.1.patch, LUCENE-1575.2.patch, 
> LUCENE-1575.3.patch, LUCENE-1575.patch
>
>
> This issue is a result of a recent discussion we've had on the mailing list. 
> You can read the thread 
> [here|http://www.nabble.com/Is-TopDocCollector%27s-collect()-implementation-correct--td22557419.html].
> We have agreed to do the following refactoring:
> * Rename MultiReaderHitCollector to Collector, with the purpose that it will 
> be the base class for all Collector implementations.
> * Deprecate HitCollector in favor of the new Collector.
> * Introduce new methods in IndexSearcher that accept Collector, and deprecate 
> those that accept HitCollector.
> ** Create a final class HitCollectorWrapper, and use it in the deprecated 
> methods in IndexSearcher, wrapping the given HitCollector.
> ** HitCollectorWrapper will be marked deprecated, so we can remove it in 3.0, 
> when we remove HitCollector.
> ** It will remove any instanceof checks that currently exist in IndexSearcher 
> code.
> * Create a new (abstract) TopDocsCollector, which will:
> ** Leave collect and setNextReader unimplemented.
> ** Introduce protected members PriorityQueue and totalHits.
> ** Introduce a single protected constructor which accepts a PriorityQueue.
> ** Implement topDocs() and getTotalHits() using the PQ and totalHits members. 
> These can be used as-are by extending classes, as well as be overridden.
> ** Introduce a new topDocs(start, howMany) method which will be used a 
> convenience method when implementing a search application which allows paging 
> through search results. It will also attempt to improve the memory 
> allocation, by allocating a ScoreDoc[] of the requested size only.
> * Change TopScoreDocCollector to extend TopDocsCollector, use the topDocs() 
> and getTotalHits() implementations as they are from TopDocsCollector. The 
> class will also be made final.
> * Change TopFieldCollector to extend TopDocsCollector, and make the class 
> final. Also implement topDocs(start, howMany).
> * Change TopFieldDocCollector (deprecated) to extend TopDocsCollector, 
> instead of TopScoreDocCollector. Implement topDocs(start, howMany)
> * Review other places where HitCollector is used, such as in Scorer, 
> deprecate those places and use Collector instead.
> Additionally, the following proposal was made w.r.t. decoupling score from 
> collect():
> * Change collect to accecpt only a doc Id (unbased).
> * Introduce a setScorer(Scorer) method.
> * If during collect the implementation needs the score, it can call 
> scorer.score().
> If we do this, then we need to review all places in the code where 
> collect(doc, score) is called, and assert whether Scorer can be passed. Also 
> this raises few questions:
> * What if during collect() Scorer is null? (i.e., not set) - is it even 
> possible?
> * I noticed that many (if not all) of the collect() implementations discard 
> the document if its score is not greater than 0. Doesn't it mean that score 
> is needed in collect() always?
> Open issues:
> * The name for Collector
> * TopDocsCollector was mentioned on the thread as TopResultsCollector, but 
> that was when we thought to call Colletor ResultsColletor. Since we decided 
> (so far) on Collector, I think TopDocsCollector makes sense, because of its 
> TopDocs output.
> * Decoupling score from collect().
> I will post a patch a bit later, as this is expected to be a very large 
> patch. I will split it into 2: (1) code patch (2) test cases (moving to use 
> Collector instead of HitCollector, as well as testing the new topDocs(start, 
> howMany) method.
> There might be even a 3rd patch which handles the setScorer thing in 
> Collector (maybe even a different issue?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to