[jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection

Michael McCandless (JIRA) Sun, 20 Jan 2013 11:50:13 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558361#comment-13558361
 ]


Michael McCandless commented on LUCENE-4600:
--------------------------------------------

Results if I rebuild the index with NO_PARENTS (just to make sure the locality 
gains are not due to frequently visiting the parent ords in the count array):
{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                 Respell       55.59      (3.9%)       54.45      (3.4%)   
-2.0% (  -8% -    5%)
                  IntNRQ       18.34      (7.1%)       18.04      (6.4%)   
-1.7% ( -14% -   12%)
              AndHighLow       86.87      (0.6%)       86.26      (1.9%)   
-0.7% (  -3% -    1%)
             MedSpanNear       97.31      (0.9%)       96.63      (1.8%)   
-0.7% (  -3% -    1%)
                 Prefix3       46.40      (5.6%)       46.11      (4.6%)   
-0.6% ( -10% -   10%)
             LowSpanNear       97.76      (0.9%)       97.28      (1.8%)   
-0.5% (  -3% -    2%)
                  Fuzzy2       31.88      (1.6%)       31.77      (2.7%)   
-0.3% (  -4% -    3%)
                Wildcard       62.53      (2.9%)       62.34      (2.5%)   
-0.3% (  -5% -    5%)
                PKLookup      210.69      (1.5%)      210.37      (1.8%)   
-0.1% (  -3% -    3%)
            HighSpanNear       97.44      (1.4%)       97.35      (1.7%)   
-0.1% (  -3% -    3%)
               MedPhrase       49.87      (2.4%)       50.18      (2.5%)    
0.6% (  -4% -    5%)
              HighPhrase       14.32      (8.8%)       14.42      (8.8%)    
0.7% ( -15% -   20%)
                 LowTerm       37.64      (0.5%)       37.90      (1.3%)    
0.7% (  -1% -    2%)
              AndHighMed       45.23      (0.6%)       45.74      (1.1%)    
1.1% (   0% -    2%)
                 MedTerm       22.53      (1.0%)       23.00      (1.3%)    
2.1% (   0% -    4%)
         LowSloppyPhrase       16.27      (2.5%)       16.65      (5.7%)    
2.3% (  -5% -   10%)
                  Fuzzy1       24.86      (1.7%)       25.87      (1.4%)    
4.1% (   0% -    7%)
                HighTerm        7.67      (1.6%)        8.00      (2.4%)    
4.3% (   0% -    8%)
         MedSloppyPhrase       16.67      (1.2%)       17.58      (3.1%)    
5.5% (   1% -    9%)
        HighSloppyPhrase        0.81      (6.6%)        0.86     (12.8%)    
6.9% ( -11% -   28%)
             AndHighHigh       11.38      (0.8%)       12.18      (1.2%)    
7.1% (   5% -    9%)
               LowPhrase       14.69      (4.7%)       15.82      (5.7%)    
7.6% (  -2% -   18%)
              OrHighHigh        3.60      (2.3%)        4.32      (3.3%)   
20.0% (  14% -   26%)
               OrHighMed        6.20      (1.9%)        7.51      (3.0%)   
21.1% (  15% -   26%)
               OrHighLow        6.25      (2.0%)        7.60      (2.4%)   
21.7% (  17% -   26%)
{noformat}

So net/net post is still better!  Separately it looks like NO_PARENTS is maybe 
~10% faster for the high-cost queries, but slower for the low cost queries ... 
which is expected because iterating over 2.2 M ords in the end is a fixed 
non-trivial cost ...
                
> Explore facets aggregation during documents collection
> ------------------------------------------------------
>
>                 Key: LUCENE-4600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4600
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>         Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, 
> LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, 
> LUCENE-4600.patch
>
>
> Today the facet module simply gathers all hits (as a bitset, optionally with 
> a float[] to hold scores as well, if you will aggregate them) during 
> collection, and then at the end when you call getFacetsResults(), it makes a 
> 2nd pass over all those hits doing the actual aggregation.
> We should investigate just aggregating as we collect instead, so we don't 
> have to tie up transient RAM (fairly small for the bit set but possibly big 
> for the float[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection

Reply via email to