[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

Shikhar Bhushan (JIRA) Thu, 03 Apr 2014 12:15:41 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959101#comment-13959101
 ]


Shikhar Bhushan commented on LUCENE-5527:
-----------------------------------------

Thanks for picking this up Adrien! I always wanted to push forward at least the 
API refactoring but did not get the chance to do so.

+1 on adding a method like LeafCollector.done()  / finish() or such, and making 
that part of the usage contract.

It's not just Solr with DelegatingCollector that has something like this, I 
think I remember seeing this pattern even in ES.

LUCENE-5299 had this as a SubCollector.done() method and it led to a lot of 
code-cleanup at various places where we were trying to detect a transition to 
the next segment based on a call to setNextReader(). In some cases, the result 
finalization was being done lazily when result retrieval methods are being 
called, because there is no other good way of knowing that the last segment has 
been processed.

> Make the Collector API work per-segment
> ---------------------------------------
>
>                 Key: LUCENE-5527
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5527
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

Reply via email to