[jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection

Michael McCandless (JIRA) Sun, 20 Jan 2013 09:58:13 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558330#comment-13558330
 ]


Michael McCandless commented on LUCENE-4600:
--------------------------------------------

I ran the same test, but w/ the full set of query categories:
{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
              AndHighLow      111.98      (1.0%)      110.10      (1.0%)   
-1.7% (  -3% -    0%)
            HighSpanNear      128.42      (1.4%)      126.32      (1.1%)   
-1.6% (  -4% -    0%)
             LowSpanNear      128.68      (1.4%)      126.59      (1.0%)   
-1.6% (  -3% -    0%)
             MedSpanNear      128.18      (1.3%)      126.29      (1.1%)   
-1.5% (  -3% -    0%)
                 Respell       55.79      (3.9%)       55.35      (4.8%)   
-0.8% (  -9% -    8%)
                PKLookup      206.89      (1.1%)      208.08      (1.5%)    
0.6% (  -2% -    3%)
                  Fuzzy2       36.21      (1.3%)       36.49      (2.3%)    
0.8% (  -2% -    4%)
               MedPhrase       56.42      (1.4%)       56.94      (1.3%)    
0.9% (  -1% -    3%)
                Wildcard       64.26      (3.8%)       64.88      (2.0%)    
1.0% (  -4% -    7%)
              AndHighMed       51.80      (0.7%)       52.44      (1.2%)    
1.2% (   0% -    3%)
                  IntNRQ       18.49      (4.8%)       18.78      (5.5%)    
1.6% (  -8% -   12%)
                 LowTerm       41.15      (0.6%)       41.82      (0.9%)    
1.6% (   0% -    3%)
                 Prefix3       46.94      (4.3%)       47.92      (3.4%)    
2.1% (  -5% -   10%)
                 MedTerm       18.47      (0.8%)       18.92      (1.3%)    
2.4% (   0% -    4%)
              HighPhrase       15.16      (6.2%)       15.77      (4.3%)    
4.0% (  -6% -   15%)
                HighTerm        6.76      (1.2%)        7.07      (1.2%)    
4.5% (   2% -    7%)
         LowSloppyPhrase       17.14      (3.8%)       17.96      (2.3%)    
4.8% (  -1% -   11%)
                  Fuzzy1       27.29      (0.8%)       28.62      (1.4%)    
4.9% (   2% -    7%)
         MedSloppyPhrase       17.64      (2.4%)       18.90      (1.0%)    
7.2% (   3% -   10%)
             AndHighHigh       11.11      (0.5%)       11.97      (0.9%)    
7.7% (   6% -    9%)
        HighSloppyPhrase        0.83     (10.5%)        0.91      (5.9%)   
10.1% (  -5% -   29%)
               LowPhrase       15.83      (3.2%)       17.45      (0.2%)   
10.2% (   6% -   14%)
              OrHighHigh        3.22      (0.7%)        3.80      (1.5%)   
18.1% (  15% -   20%)
               OrHighLow        5.68      (0.3%)        6.73      (1.5%)   
18.4% (  16% -   20%)
               OrHighMed        5.61      (0.5%)        6.66      (1.6%)   
18.7% (  16% -   20%)
{noformat}

Somehow post-collection is a big gain for the Or queries ... I wonder if 
somehow we are not getting the out of order scorer (BooleanScorer) w/ 
CountingCollector ... but looking at both collectors they both return true from 
acceptsDocsOutOfOrder ...

Net/net it seems like we should stick with post collection?  The possible 
downside is memory use of the temporary bit set I guess ...
                
> Explore facets aggregation during documents collection
> ------------------------------------------------------
>
>                 Key: LUCENE-4600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4600
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>         Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, 
> LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, 
> LUCENE-4600.patch
>
>
> Today the facet module simply gathers all hits (as a bitset, optionally with 
> a float[] to hold scores as well, if you will aggregate them) during 
> collection, and then at the end when you call getFacetsResults(), it makes a 
> 2nd pass over all those hits doing the actual aggregation.
> We should investigate just aggregating as we collect instead, so we don't 
> have to tie up transient RAM (fairly small for the bit set but possibly big 
> for the float[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection

Reply via email to