[jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection

Michael McCandless (JIRA) Mon, 21 Jan 2013 05:26:15 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558752#comment-13558752
 ]


Michael McCandless commented on LUCENE-4600:
--------------------------------------------

NO_PARENTS CountingFacetsCollector vs itself (ie all differences are noise).  
Use the absolute QPS to compare to the "QPS comp" column above, eg MedTerm was 
18.89 QPS above with ALL_PARENTS and with NO_PARENTS MedTerm is 22.67-22.80 QPS:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
              AndHighLow       85.20      (5.0%)       83.74      (5.7%)   
-1.7% ( -11% -    9%)
             LowSpanNear       95.25      (5.5%)       93.67      (6.8%)   
-1.7% ( -13% -   11%)
            HighSpanNear       95.19      (5.4%)       93.80      (6.7%)   
-1.5% ( -12% -   11%)
             MedSpanNear       94.97      (5.4%)       93.59      (6.8%)   
-1.5% ( -12% -   11%)
              AndHighMed       45.68      (2.8%)       45.29      (2.9%)   
-0.9% (  -6% -    4%)
               OrHighLow        7.62      (2.2%)        7.55      (2.2%)   
-0.8% (  -5% -    3%)
              OrHighHigh        4.33      (2.2%)        4.29      (2.2%)   
-0.8% (  -5% -    3%)
                 LowTerm       38.17      (2.0%)       37.90      (2.2%)   
-0.7% (  -4% -    3%)
               OrHighMed        7.54      (2.2%)        7.49      (2.1%)   
-0.7% (  -4% -    3%)
                 Prefix3       45.95      (4.3%)       45.68      (4.4%)   
-0.6% (  -8% -    8%)
                 MedTerm       22.80      (2.2%)       22.67      (2.1%)   
-0.6% (  -4% -    3%)
                  Fuzzy1       26.16      (1.9%)       26.04      (2.0%)   
-0.4% (  -4% -    3%)
                  IntNRQ       17.94      (6.1%)       17.86      (6.2%)   
-0.4% ( -11% -   12%)
             AndHighHigh       12.33      (1.2%)       12.29      (1.3%)   
-0.4% (  -2% -    2%)
                  Fuzzy2       32.00      (2.8%)       31.89      (3.0%)   
-0.3% (  -5% -    5%)
               MedPhrase       49.48      (3.9%)       49.32      (4.4%)   
-0.3% (  -8% -    8%)
                HighTerm        8.02      (2.1%)        8.00      (2.0%)   
-0.2% (  -4% -    3%)
                PKLookup      211.76      (1.4%)      211.32      (1.8%)   
-0.2% (  -3% -    3%)
                Wildcard       62.37      (2.3%)       62.28      (2.3%)   
-0.1% (  -4% -    4%)
         MedSloppyPhrase       17.49      (2.5%)       17.52      (2.7%)    
0.2% (  -4% -    5%)
                 Respell       55.68      (5.0%)       55.85      (3.3%)    
0.3% (  -7% -    9%)
         LowSloppyPhrase       16.29      (4.7%)       16.43      (5.2%)    
0.9% (  -8% -   11%)
               LowPhrase       15.68      (5.3%)       15.81      (5.4%)    
0.9% (  -9% -   12%)
              HighPhrase       14.22      (8.7%)       14.45      (8.9%)    
1.6% ( -14% -   21%)
        HighSloppyPhrase        0.83      (9.3%)        0.85     (11.9%)    
2.1% ( -17% -   25%)
{noformat}
                
> Explore facets aggregation during documents collection
> ------------------------------------------------------
>
>                 Key: LUCENE-4600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4600
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Shai Erera
>         Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, 
> LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, 
> LUCENE-4600.patch, LUCENE-4600.patch
>
>
> Today the facet module simply gathers all hits (as a bitset, optionally with 
> a float[] to hold scores as well, if you will aggregate them) during 
> collection, and then at the end when you call getFacetsResults(), it makes a 
> 2nd pass over all those hits doing the actual aggregation.
> We should investigate just aggregating as we collect instead, so we don't 
> have to tie up transient RAM (fairly small for the bit set but possibly big 
> for the float[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection

Reply via email to