[jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection

Michael McCandless (JIRA) Mon, 21 Jan 2013 07:52:14 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558853#comment-13558853
 ]


Michael McCandless commented on LUCENE-4600:
--------------------------------------------

The performance depends heavily on how many ords your taxo index has ... my 
last test was ~2.5M ords, but when I build an index leaving out the two 
dimensions (categories, username) with the most ords, leaving 4703 unique ords, 
the numbers are much better:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                 Prefix3      161.48      (6.1%)      161.99      (7.4%)    
0.3% ( -12% -   14%)
                PKLookup      235.50      (2.4%)      236.41      (2.1%)    
0.4% (  -4% -    5%)
                 Respell       85.41      (4.4%)       85.92      (4.2%)    
0.6% (  -7% -    9%)
              AndHighLow     1196.56      (2.1%)     1204.67      (3.4%)    
0.7% (  -4% -    6%)
                  IntNRQ      104.88      (6.7%)      105.77      (9.0%)    
0.9% ( -13% -   17%)
                Wildcard      215.17      (2.2%)      217.13      (2.6%)    
0.9% (  -3% -    5%)
        HighSloppyPhrase        3.24      (8.2%)        3.27      (9.2%)    
1.0% ( -15% -   19%)
             LowSpanNear       42.80      (3.0%)       43.68      (2.8%)    
2.1% (  -3% -    8%)
                  Fuzzy2       84.83      (3.6%)       86.70      (2.8%)    
2.2% (  -4% -    8%)
            HighSpanNear       11.42      (1.9%)       11.70      (2.3%)    
2.4% (  -1% -    6%)
               LowPhrase       71.69      (6.8%)       73.91      (6.2%)    
3.1% (  -9% -   17%)
                  Fuzzy1       75.53      (3.4%)       78.81      (2.7%)    
4.3% (  -1% -   10%)
              HighPhrase       42.58     (11.4%)       44.61     (11.5%)    
4.8% ( -16% -   31%)
         LowSloppyPhrase       80.22      (2.3%)       84.49      (3.1%)    
5.3% (   0% -   10%)
             MedSpanNear       85.37      (1.9%)       91.16      (1.8%)    
6.8% (   3% -   10%)
         MedSloppyPhrase       86.55      (2.7%)       92.84      (3.2%)    
7.3% (   1% -   13%)
               MedPhrase      145.23      (5.6%)      156.11      (6.1%)    
7.5% (  -3% -   20%)
              AndHighMed      321.74      (1.2%)      346.20      (1.5%)    
7.6% (   4% -   10%)
             AndHighHigh       84.28      (1.6%)       96.80      (1.7%)   
14.9% (  11% -   18%)
              OrHighHigh       35.03      (2.9%)       42.53      (4.6%)   
21.4% (  13% -   29%)
               OrHighMed       51.75      (3.0%)       63.90      (4.6%)   
23.5% (  15% -   32%)
               OrHighLow       50.41      (3.0%)       62.51      (4.7%)   
24.0% (  15% -   32%)
                HighTerm       58.55      (3.0%)       74.59      (4.2%)   
27.4% (  19% -   35%)
                 LowTerm      355.14      (1.6%)      480.44      (2.3%)   
35.3% (  30% -   39%)
                 MedTerm      206.44      (2.0%)      286.54      (3.1%)   
38.8% (  33% -   44%)
{noformat}

I also separately fixed a silly bug in luceneutil which was causing the *Span* 
queries to get 0 hits.
                
> Explore facets aggregation during documents collection
> ------------------------------------------------------
>
>                 Key: LUCENE-4600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4600
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Shai Erera
>         Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, 
> LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, 
> LUCENE-4600.patch, LUCENE-4600.patch
>
>
> Today the facet module simply gathers all hits (as a bitset, optionally with 
> a float[] to hold scores as well, if you will aggregate them) during 
> collection, and then at the end when you call getFacetsResults(), it makes a 
> 2nd pass over all those hits doing the actual aggregation.
> We should investigate just aggregating as we collect instead, so we don't 
> have to tie up transient RAM (fairly small for the bit set but possibly big 
> for the float[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection

Reply via email to