[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

Michael McCandless (JIRA) Wed, 06 Nov 2013 10:32:51 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815125#comment-13815125
 ]


Michael McCandless commented on LUCENE-5316:
--------------------------------------------

Hmm but this is the NO_PARENTS perf:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
              AndHighLow       41.14      (2.5%)       14.32      (4.2%)  
-65.2% ( -70% -  -59%)
               MedPhrase       30.42      (2.7%)       12.74      (4.5%)  
-58.1% ( -63% -  -52%)
                 LowTerm       28.27      (1.6%)       12.33      (4.6%)  
-56.4% ( -61% -  -50%)
            OrNotHighLow       24.15      (2.6%)       11.47      (4.7%)  
-52.5% ( -58% -  -46%)
                  Fuzzy1       21.93      (1.6%)       10.94      (4.6%)  
-50.1% ( -55% -  -44%)
                  Fuzzy2       18.12      (1.7%)        9.89      (4.5%)  
-45.4% ( -50% -  -39%)
         LowSloppyPhrase       17.97      (1.8%)        9.84      (4.7%)  
-45.2% ( -50% -  -39%)
            OrNotHighMed       14.90      (2.7%)        8.89      (4.9%)  
-40.4% ( -46% -  -33%)
             MedSpanNear       13.41      (2.4%)        8.25      (4.3%)  
-38.5% ( -44% -  -32%)
              AndHighMed       12.41      (1.5%)        7.90      (4.4%)  
-36.4% ( -41% -  -30%)
             AndHighHigh       10.72      (1.1%)        7.22      (4.2%)  
-32.7% ( -37% -  -27%)
                 Prefix3       10.22      (1.8%)        6.93      (4.1%)  
-32.2% ( -37% -  -26%)
                 MedTerm        9.86      (2.2%)        6.76      (4.1%)  
-31.4% ( -36% -  -25%)
               LowPhrase        7.28      (6.0%)        5.43      (3.9%)  
-25.4% ( -33% -  -16%)
           OrNotHighHigh        7.38      (2.1%)        5.53      (3.8%)  
-25.2% ( -30% -  -19%)
                HighTerm        6.85      (2.2%)        5.20      (3.6%)  
-24.1% ( -29% -  -18%)
             LowSpanNear        5.70      (3.2%)        4.48      (4.1%)  
-21.4% ( -27% -  -14%)
            OrHighNotMed        5.69      (2.2%)        4.52      (3.2%)  
-20.6% ( -25% -  -15%)
               OrHighMed        4.57      (2.6%)        3.78      (2.6%)  
-17.2% ( -21% -  -12%)
           OrHighNotHigh        3.89      (2.5%)        3.29      (2.7%)  
-15.3% ( -20% -  -10%)
                Wildcard        3.76      (2.7%)        3.20      (2.4%)  
-14.9% ( -19% -  -10%)
            OrHighNotLow        3.36      (2.1%)        2.94      (2.1%)  
-12.4% ( -16% -   -8%)
        HighSloppyPhrase        2.51      (6.6%)        2.23      (5.9%)  
-11.1% ( -22% -    1%)
            HighSpanNear        2.58      (3.5%)        2.29      (2.8%)  
-11.0% ( -16% -   -4%)
         MedSloppyPhrase        2.43      (5.2%)        2.19      (4.8%)   
-9.8% ( -18% -    0%)
              HighPhrase        2.20      (6.8%)        2.00      (4.8%)   
-9.2% ( -19% -    2%)
               OrHighLow        2.29      (2.3%)        2.09      (1.9%)   
-8.8% ( -12% -   -4%)
              OrHighHigh        1.72      (2.6%)        1.61      (1.6%)   
-6.2% ( -10% -   -2%)
                  IntNRQ        1.25      (2.7%)        1.19      (1.1%)   
-4.5% (  -8% -    0%)
                 Respell       40.60      (2.8%)       39.77      (2.8%)   
-2.0% (  -7% -    3%)
{noformat}


> Taxonomy tree traversing improvement
> ------------------------------------
>
>                 Key: LUCENE-5316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5316
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Gilad Barkai
>            Priority: Minor
>         Attachments: LUCENE-5316.patch, LUCENE-5316.patch, LUCENE-5316.patch
>
>
> The taxonomy traversing is done today utilizing the 
> {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays 
> which hold for each ordinal it's (array #1) youngest child and (array #2) 
> older sibling.
> This is a compact way of holding the tree information in memory, but it's not 
> perfect:
> * Large (8 bytes per ordinal in memory)
> * Exposes internal implementation
> * Utilizing these arrays for tree traversing is not straight forward
> * Lose reference locality while traversing (the array is accessed in 
> increasing only entries, but they may be distant from one another)
> * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
> This issue is about making the traversing more easy, the code more readable, 
> and open it for future improvements (i.e memory footprint and NRT cost) - 
> without changing any of the internals. 
> A later issue(s?) could be opened to address the gaps once this one is done.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

Reply via email to