[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

Michael McCandless (JIRA) Sun, 10 Nov 2013 08:19:58 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818469#comment-13818469
 ]


Michael McCandless commented on LUCENE-5316:
--------------------------------------------

I re-ran ALL_BUT_DIM and NO_PARENTS on the last patch:

ALL_BUT_DIM:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
         LowSloppyPhrase      195.79      (6.4%)      160.76      (6.5%)  
-17.9% ( -28% -   -5%)
             MedSpanNear      189.11      (6.1%)      155.88      (6.5%)  
-17.6% ( -28% -   -5%)
              AndHighLow      171.05      (5.4%)      142.46      (5.9%)  
-16.7% ( -26% -   -5%)
              HighPhrase      165.56      (5.6%)      140.32      (6.0%)  
-15.2% ( -25% -   -3%)
        HighSloppyPhrase      135.86      (4.7%)      117.90      (5.3%)  
-13.2% ( -22% -   -3%)
            HighSpanNear       98.69      (4.1%)       88.28      (4.5%)  
-10.5% ( -18% -   -2%)
               MedPhrase       89.68      (4.3%)       81.23      (3.7%)   
-9.4% ( -16% -   -1%)
            OrNotHighLow       93.45      (5.5%)       85.07      (4.9%)   
-9.0% ( -18% -    1%)
                 LowTerm       87.06      (3.4%)       79.50      (3.8%)   
-8.7% ( -15% -   -1%)
                  Fuzzy1       63.87      (2.5%)       59.39      (2.9%)   
-7.0% ( -12% -   -1%)
              AndHighMed       53.60      (1.9%)       50.49      (2.6%)   
-5.8% ( -10% -   -1%)
               OrHighLow       54.32      (2.2%)       51.18      (2.4%)   
-5.8% ( -10% -   -1%)
           OrNotHighHigh       62.71      (5.5%)       59.11      (5.0%)   
-5.7% ( -15% -    5%)
            OrNotHighMed       47.72      (3.4%)       45.35      (3.1%)   
-5.0% ( -11% -    1%)
                  Fuzzy2       48.40      (2.2%)       46.07      (2.4%)   
-4.8% (  -9% -    0%)
             AndHighHigh       31.48      (1.6%)       30.33      (1.5%)   
-3.7% (  -6% -    0%)
                 MedTerm       35.33      (2.0%)       34.06      (1.9%)   
-3.6% (  -7% -    0%)
         MedSloppyPhrase       17.17      (4.4%)       16.67      (4.3%)   
-2.9% ( -11% -    6%)
                 Prefix3       27.73      (1.6%)       26.93      (1.2%)   
-2.9% (  -5% -    0%)
            OrHighNotMed       24.31      (2.4%)       23.79      (1.1%)   
-2.1% (  -5% -    1%)
               LowPhrase       14.56      (4.2%)       14.28      (4.0%)   
-1.9% (  -9% -    6%)
             LowSpanNear       11.25      (2.4%)       11.04      (1.7%)   
-1.9% (  -5% -    2%)
              OrHighHigh       17.63      (1.6%)       17.38      (1.1%)   
-1.4% (  -4% -    1%)
            OrHighNotLow       18.97      (1.8%)       18.69      (0.9%)   
-1.4% (  -4% -    1%)
                Wildcard       13.21      (1.4%)       13.03      (0.9%)   
-1.4% (  -3% -    0%)
                HighTerm       16.34      (1.8%)       16.14      (1.9%)   
-1.3% (  -4% -    2%)
               OrHighMed       18.11      (1.6%)       17.93      (1.4%)   
-1.0% (  -3% -    2%)
                 Respell       89.31      (2.8%)       88.78      (2.2%)   
-0.6% (  -5% -    4%)
           OrHighNotHigh        9.09      (2.0%)        9.08      (1.4%)   
-0.1% (  -3% -    3%)
                  IntNRQ        4.87      (1.2%)        4.90      (1.2%)    
0.7% (  -1% -    3%)
{noformat}


NO_PARENTS:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
         LowSloppyPhrase       98.63      (4.7%)       28.73      (2.9%)  
-70.9% ( -74% -  -66%)
             MedSpanNear       97.31      (4.7%)       28.54      (2.9%)  
-70.7% ( -74% -  -66%)
              AndHighLow       91.63      (3.9%)       28.04      (2.9%)  
-69.4% ( -73% -  -65%)
              HighPhrase       90.81      (3.6%)       27.94      (2.9%)  
-69.2% ( -73% -  -65%)
        HighSloppyPhrase       80.24      (3.2%)       26.90      (3.1%)  
-66.5% ( -70% -  -62%)
            HighSpanNear       65.93      (2.7%)       24.97      (3.3%)  
-62.1% ( -66% -  -57%)
            OrNotHighLow       64.00      (3.3%)       24.74      (3.2%)  
-61.3% ( -65% -  -56%)
               MedPhrase       62.06      (4.1%)       24.52      (3.3%)  
-60.5% ( -65% -  -55%)
                 LowTerm       61.33      (2.6%)       24.40      (3.3%)  
-60.2% ( -64% -  -55%)
           OrNotHighHigh       48.27      (2.8%)       21.97      (3.4%)  
-54.5% ( -58% -  -49%)
                  Fuzzy1       47.61      (2.2%)       21.90      (3.5%)  
-54.0% ( -58% -  -49%)
               OrHighLow       43.63      (2.6%)       21.07      (3.4%)  
-51.7% ( -56% -  -46%)
              AndHighMed       42.86      (2.6%)       20.75      (3.4%)  
-51.6% ( -56% -  -46%)
            OrNotHighMed       39.23      (2.0%)       19.93      (3.3%)  
-49.2% ( -53% -  -44%)
                  Fuzzy2       38.49      (2.3%)       19.76      (3.3%)  
-48.6% ( -53% -  -44%)
                 MedTerm       31.48      (2.6%)       17.82      (3.5%)  
-43.4% ( -48% -  -38%)
             AndHighHigh       27.49      (1.9%)       16.39      (3.3%)  
-40.4% ( -44% -  -35%)
                 Prefix3       25.17      (2.6%)       15.71      (3.3%)  
-37.6% ( -42% -  -32%)
            OrHighNotMed       22.44      (2.0%)       14.56      (3.0%)  
-35.1% ( -39% -  -30%)
            OrHighNotLow       18.01      (1.7%)       12.66      (2.8%)  
-29.7% ( -33% -  -25%)
               OrHighMed       17.37      (2.1%)       12.33      (2.8%)  
-29.0% ( -33% -  -24%)
              OrHighHigh       17.02      (2.4%)       12.15      (2.8%)  
-28.6% ( -33% -  -23%)
         MedSloppyPhrase       15.76      (4.5%)       11.26      (3.8%)  
-28.6% ( -35% -  -21%)
                HighTerm       15.80      (2.4%)       11.62      (2.9%)  
-26.5% ( -30% -  -21%)
               LowPhrase       13.51      (4.5%)       10.19      (3.0%)  
-24.6% ( -30% -  -17%)
                Wildcard       12.90      (1.5%)       10.04      (2.3%)  
-22.1% ( -25% -  -18%)
             LowSpanNear       10.56      (2.1%)        8.40      (2.5%)  
-20.4% ( -24% -  -16%)
           OrHighNotHigh        9.39      (1.5%)        7.84      (2.1%)  
-16.5% ( -19% -  -13%)
                  IntNRQ        5.15      (1.9%)        4.81      (1.4%)   
-6.7% (  -9% -   -3%)
                 Respell       84.97      (2.8%)       88.45      (3.3%)    
4.1% (  -1% -   10%)
{noformat}

I still see some queries coming back w/ all 0 facets ... not sure why.

> Taxonomy tree traversing improvement
> ------------------------------------
>
>                 Key: LUCENE-5316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5316
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Gilad Barkai
>            Priority: Minor
>         Attachments: LUCENE-5316.patch, LUCENE-5316.patch, LUCENE-5316.patch
>
>
> The taxonomy traversing is done today utilizing the 
> {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays 
> which hold for each ordinal it's (array #1) youngest child and (array #2) 
> older sibling.
> This is a compact way of holding the tree information in memory, but it's not 
> perfect:
> * Large (8 bytes per ordinal in memory)
> * Exposes internal implementation
> * Utilizing these arrays for tree traversing is not straight forward
> * Lose reference locality while traversing (the array is accessed in 
> increasing only entries, but they may be distant from one another)
> * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
> This issue is about making the traversing more easy, the code more readable, 
> and open it for future improvements (i.e memory footprint and NRT cost) - 
> without changing any of the internals. 
> A later issue(s?) could be opened to address the gaps once this one is done.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

Reply via email to