[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

Michael McCandless (JIRA) Thu, 03 Apr 2014 11:49:35 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959078#comment-13959078
 ]


Michael McCandless commented on LUCENE-5527:
--------------------------------------------

+1 for LeafCollector and the patch.

I tested if there are search performance impacts from this:

{noformat}
Report after iter 10:
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                 Respell       49.44      (3.3%)       48.10      (3.7%)   
-2.7% (  -9% -    4%)
                  Fuzzy2       46.74      (3.2%)       45.73      (3.1%)   
-2.2% (  -8% -    4%)
                  Fuzzy1       59.25      (3.7%)       58.08      (3.5%)   
-2.0% (  -8% -    5%)
                  IntNRQ        3.42      (3.8%)        3.40      (3.8%)   
-0.7% (  -7% -    7%)
                 Prefix3       86.67      (2.6%)       86.17      (2.6%)   
-0.6% (  -5% -    4%)
         LowSloppyPhrase       44.44      (2.3%)       44.42      (2.5%)   
-0.1% (  -4% -    4%)
                Wildcard       19.08      (3.5%)       19.07      (3.0%)   
-0.1% (  -6% -    6%)
              AndHighMed       34.38      (1.0%)       34.38      (1.0%)   
-0.0% (  -2% -    2%)
             LowSpanNear       10.41      (3.1%)       10.41      (2.3%)    
0.0% (  -5% -    5%)
        HighSloppyPhrase        3.49      (7.9%)        3.49      (6.6%)    
0.1% ( -13% -   15%)
             AndHighHigh       28.35      (1.1%)       28.39      (1.0%)    
0.1% (  -1% -    2%)
             MedSpanNear       31.06      (2.8%)       31.12      (2.7%)    
0.2% (  -5% -    5%)
              AndHighLow      391.44      (2.9%)      392.73      (2.6%)    
0.3% (  -5% -    6%)
         MedSloppyPhrase        3.54      (5.2%)        3.56      (4.6%)    
0.4% (  -8% -   10%)
               OrHighMed       26.51      (4.0%)       26.66      (5.7%)    
0.6% (  -8% -   10%)
            OrHighNotLow       24.84      (4.1%)       24.98      (5.8%)    
0.6% (  -9% -   10%)
               LowPhrase       13.19      (1.6%)       13.27      (2.3%)    
0.6% (  -3% -    4%)
               OrHighLow       18.78      (4.1%)       18.91      (5.8%)    
0.7% (  -8% -   11%)
           OrNotHighHigh        8.87      (4.5%)        8.93      (6.0%)    
0.7% (  -9% -   11%)
            OrHighNotMed       30.63      (4.1%)       30.85      (5.5%)    
0.7% (  -8% -   10%)
              OrHighHigh        8.21      (4.1%)        8.27      (5.8%)    
0.7% (  -8% -   11%)
               MedPhrase      203.10      (6.6%)      204.77      (6.3%)    
0.8% ( -11% -   14%)
           OrHighNotHigh       11.09      (4.5%)       11.18      (5.9%)    
0.8% (  -9% -   11%)
                 LowTerm      322.74      (5.6%)      325.67      (5.6%)    
0.9% (  -9% -   12%)
                HighTerm       63.88     (12.8%)       64.55     (12.2%)    
1.1% ( -21% -   29%)
                 MedTerm      100.19      (9.8%)      101.31      (9.5%)    
1.1% ( -16% -   22%)
            HighSpanNear        8.09      (4.0%)        8.18      (4.9%)    
1.1% (  -7% -   10%)
              HighPhrase        4.27      (7.1%)        4.32      (6.5%)    
1.2% ( -11% -   15%)
            OrNotHighMed       19.00      (7.0%)       19.30      (7.6%)    
1.6% ( -12% -   17%)
            OrNotHighLow       19.63      (7.4%)       19.96      (8.0%)    
1.7% ( -12% -   18%)
{noformat}

Looks like just noise!

> Make the Collector API work per-segment
> ---------------------------------------
>
>                 Key: LUCENE-5527
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5527
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: LUCENE-5527.patch
>
>
> Spin-off of LUCENE-5299.
> LUCENE-5229 proposes different changes, some of them being controversial, but 
> there is one of them that I really really like that consists in refactoring 
> the {{Collector}} API in order to have a different Collector per segment.
> The idea is, instead of having a single Collector object that needs to be 
> able to take care of all segments, to have a top-level Collector:
> {code}
> public interface Collector {
>   AtomicCollector setNextReader(AtomicReaderContext context) throws 
> IOException;
>   
> }
> {code}
> and a per-AtomicReaderContext collector:
> {code}
> public interface AtomicCollector {
>   void setScorer(Scorer scorer) throws IOException;
>   void collect(int doc) throws IOException;
>   boolean acceptsDocsOutOfOrder();
> }
> {code}
> I think it makes the API clearer since it is now obious {{setScorer}} and 
> {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
> otherwise unclear.
> It also makes things more flexible. For example, a collector could much more 
> easily decide to use different strategies on different segments. In 
> particular, it makes the early-termination collector much cleaner since it 
> can return different atomic collectors implementations depending on whether 
> the current segment is sorted or not.
> Even if we have lots of collectors all over the place, we could make it 
> easier to migrate by having a Collector that would implement both Collector 
> and AtomicCollector, return {{this}} in setNextReader and make current 
> concrete Collector implementations extend this class instead of directly 
> extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

Reply via email to