[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

Michael McCandless (JIRA) Fri, 22 Nov 2013 07:46:16 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830060#comment-13830060
 ]


Michael McCandless commented on LUCENE-5316:
--------------------------------------------

OK, I ran the same perf tests with the last patch.  The "sometimes all
0 facet counts" problem is fixed!

NO_PARENTS:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
         LowSloppyPhrase       99.22      (4.1%)       35.51      (1.9%)  
-64.2% ( -67% -  -60%)
             MedSpanNear       96.92      (4.2%)       35.23      (2.0%)  
-63.6% ( -67% -  -59%)
              AndHighLow       91.99      (3.5%)       34.57      (2.0%)  
-62.4% ( -65% -  -58%)
              HighPhrase       90.68      (3.8%)       34.35      (2.0%)  
-62.1% ( -65% -  -58%)
        HighSloppyPhrase       81.34      (2.9%)       32.74      (2.1%)  
-59.7% ( -62% -  -56%)
            HighSpanNear       65.81      (2.9%)       30.03      (2.2%)  
-54.4% ( -57% -  -50%)
            OrNotHighLow       63.44      (3.4%)       29.66      (2.0%)  
-53.3% ( -56% -  -49%)
               MedPhrase       62.66      (3.2%)       29.30      (2.0%)  
-53.2% ( -56% -  -49%)
                 LowTerm       61.47      (3.8%)       29.02      (2.0%)  
-52.8% ( -56% -  -48%)
                  Fuzzy1       47.78      (3.3%)       25.66      (2.3%)  
-46.3% ( -50% -  -42%)
           OrNotHighHigh       47.59      (3.8%)       25.73      (2.3%)  
-45.9% ( -50% -  -41%)
               OrHighLow       43.78      (1.9%)       24.41      (2.1%)  
-44.2% ( -47% -  -41%)
              AndHighMed       42.81      (2.1%)       24.04      (2.0%)  
-43.9% ( -47% -  -40%)
            OrNotHighMed       38.92      (2.6%)       22.95      (2.0%)  
-41.0% ( -44% -  -37%)
                  Fuzzy2       38.27      (2.6%)       22.86      (2.2%)  
-40.3% ( -43% -  -36%)
                 MedTerm       31.78      (2.5%)       20.14      (2.1%)  
-36.6% ( -40% -  -32%)
             AndHighHigh       27.50      (1.7%)       18.33      (1.9%)  
-33.3% ( -36% -  -30%)
                 Prefix3       25.26      (1.9%)       17.35      (1.7%)  
-31.3% ( -34% -  -28%)
            OrHighNotMed       22.27      (1.4%)       16.04      (1.4%)  
-28.0% ( -30% -  -25%)
            OrHighNotLow       18.01      (1.6%)       13.76      (1.5%)  
-23.6% ( -26% -  -20%)
               OrHighMed       17.33      (2.1%)       13.26      (1.6%)  
-23.5% ( -26% -  -20%)
              OrHighHigh       16.84      (1.9%)       13.05      (1.5%)  
-22.5% ( -25% -  -19%)
         MedSloppyPhrase       15.54      (3.9%)       12.22      (3.4%)  
-21.4% ( -27% -  -14%)
                HighTerm       15.87      (2.2%)       12.48      (1.7%)  
-21.3% ( -24% -  -17%)
               LowPhrase       13.78      (1.6%)       11.03      (1.3%)  
-20.0% ( -22% -  -17%)
                Wildcard       12.93      (1.9%)       10.65      (1.3%)  
-17.7% ( -20% -  -14%)
             LowSpanNear       10.55      (2.0%)        8.92      (1.7%)  
-15.5% ( -18% -  -12%)
           OrHighNotHigh        9.29      (1.4%)        8.16      (1.4%)  
-12.2% ( -14% -   -9%)
                  IntNRQ        5.19      (1.3%)        4.92      (1.9%)   
-5.1% (  -8% -   -1%)
                 Respell       85.48      (2.6%)       87.32      (2.6%)    
2.2% (  -2% -    7%)
{noformat}

ALL_BUT_DIM:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                 Respell       89.86      (3.0%)       89.27      (2.2%)   
-0.7% (  -5% -    4%)
             LowSpanNear       12.01      (2.3%)       11.95      (2.1%)   
-0.5% (  -4% -    3%)
                  Fuzzy1       88.54      (2.2%)       88.41      (1.5%)   
-0.2% (  -3% -    3%)
                  Fuzzy2       62.54      (2.1%)       62.50      (2.3%)   
-0.1% (  -4% -    4%)
           OrHighNotHigh       10.10      (1.7%)       10.09      (1.9%)   
-0.1% (  -3% -    3%)
           OrNotHighHigh       85.35      (5.4%)       85.31      (5.4%)   
-0.0% ( -10% -   11%)
            OrHighNotLow       22.11      (1.4%)       22.10      (1.2%)   
-0.0% (  -2% -    2%)
         MedSloppyPhrase       18.87      (4.0%)       18.88      (4.7%)    
0.1% (  -8% -    9%)
                HighTerm       18.93      (1.5%)       18.97      (1.7%)    
0.2% (  -2% -    3%)
               OrHighMed       21.19      (1.6%)       21.26      (1.4%)    
0.3% (  -2% -    3%)
               LowPhrase       15.79      (4.4%)       15.85      (4.2%)    
0.4% (  -7% -    9%)
             AndHighHigh       38.40      (1.0%)       38.57      (1.2%)    
0.4% (  -1% -    2%)
              OrHighHigh       20.55      (1.4%)       20.64      (1.5%)    
0.4% (  -2% -    3%)
            OrHighNotMed       29.27      (1.4%)       29.40      (1.2%)    
0.4% (  -2% -    3%)
              AndHighMed       72.26      (1.1%)       72.60      (1.1%)    
0.5% (  -1% -    2%)
                Wildcard       14.92      (1.0%)       14.99      (1.3%)    
0.5% (  -1% -    2%)
            HighSpanNear      159.71      (3.5%)      160.74      (3.7%)    
0.6% (  -6% -    8%)
                  IntNRQ        5.15      (1.4%)        5.18      (1.7%)    
0.7% (  -2% -    3%)
                 Prefix3       33.93      (1.3%)       34.18      (1.8%)    
0.7% (  -2% -    3%)
                 MedTerm       44.36      (1.7%)       44.69      (1.6%)    
0.8% (  -2% -    4%)
            OrNotHighMed       62.66      (2.4%)       63.18      (3.1%)    
0.8% (  -4% -    6%)
               OrHighLow       75.94      (1.4%)       76.65      (1.5%)    
0.9% (  -1% -    3%)
            OrNotHighLow      150.08      (4.7%)      151.62      (5.0%)    
1.0% (  -8% -   11%)
               MedPhrase      138.21      (3.7%)      139.67      (3.6%)    
1.1% (  -6% -    8%)
                 LowTerm      140.27      (2.3%)      142.14      (2.6%)    
1.3% (  -3% -    6%)
        HighSloppyPhrase      283.76      (1.3%)      291.51      (1.8%)    
2.7% (   0% -    5%)
              HighPhrase      455.49      (1.5%)      476.29      (3.1%)    
4.6% (   0% -    9%)
             MedSpanNear      660.85      (2.0%)      693.91      (2.4%)    
5.0% (   0% -    9%)
              AndHighLow      482.21      (2.1%)      511.77      (2.5%)    
6.1% (   1% -   11%)
         LowSloppyPhrase      759.01      (1.9%)      816.01      (2.1%)    
7.5% (   3% -   11%)
{noformat}


> Taxonomy tree traversing improvement
> ------------------------------------
>
>                 Key: LUCENE-5316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5316
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Gilad Barkai
>            Priority: Minor
>         Attachments: LUCENE-5316.patch, LUCENE-5316.patch, LUCENE-5316.patch, 
> LUCENE-5316.patch, LUCENE-5316.patch
>
>
> The taxonomy traversing is done today utilizing the 
> {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays 
> which hold for each ordinal it's (array #1) youngest child and (array #2) 
> older sibling.
> This is a compact way of holding the tree information in memory, but it's not 
> perfect:
> * Large (8 bytes per ordinal in memory)
> * Exposes internal implementation
> * Utilizing these arrays for tree traversing is not straight forward
> * Lose reference locality while traversing (the array is accessed in 
> increasing only entries, but they may be distant from one another)
> * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
> This issue is about making the traversing more easy, the code more readable, 
> and open it for future improvements (i.e memory footprint and NRT cost) - 
> without changing any of the internals. 
> A later issue(s?) could be opened to address the gaps once this one is done.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

Reply via email to