[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

Michael McCandless (JIRA) Wed, 06 Nov 2013 06:34:52 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814924#comment-13814924
 ]


Michael McCandless commented on LUCENE-5316:
--------------------------------------------

Here's ALL_BUT_DIM performance; it looks better!  However, I'm not sure why, 
but sometimes 1-3 of the queries that ran came back w/ all 0 facet counts.  
Maybe a thread safety issue in the quick & dirty patch?

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
              AndHighLow       70.50      (3.1%)       63.66      (5.2%)   
-9.7% ( -17% -   -1%)
               MedPhrase       43.43      (2.2%)       40.79      (3.4%)   
-6.1% ( -11% -    0%)
                 LowTerm       38.63      (2.0%)       36.46      (3.4%)   
-5.6% ( -10% -    0%)
                  Fuzzy1       28.41      (1.5%)       27.15      (2.6%)   
-4.4% (  -8% -    0%)
            OrNotHighLow       31.95      (3.7%)       30.64      (3.9%)   
-4.1% ( -11% -    3%)
         LowSloppyPhrase       21.67      (1.5%)       20.96      (2.2%)   
-3.3% (  -6% -    0%)
                  Fuzzy2       21.39      (1.7%)       20.71      (1.9%)   
-3.2% (  -6% -    0%)
            OrNotHighMed       17.30      (2.8%)       16.90      (3.3%)   
-2.3% (  -8% -    3%)
                 Prefix3       10.68      (1.5%)       10.46      (2.2%)   
-2.1% (  -5% -    1%)
              AndHighMed       13.85      (1.2%)       13.57      (1.4%)   
-2.0% (  -4% -    0%)
             MedSpanNear       15.19      (2.7%)       14.89      (2.9%)   
-2.0% (  -7% -    3%)
             AndHighHigh       11.70      (1.0%)       11.51      (1.8%)   
-1.6% (  -4% -    1%)
        HighSloppyPhrase        2.56      (8.0%)        2.52      (7.9%)   
-1.5% ( -16% -   15%)
            OrHighNotMed        5.66      (1.4%)        5.58      (1.5%)   
-1.4% (  -4% -    1%)
               LowPhrase        7.82      (5.7%)        7.72      (5.8%)   
-1.2% ( -12% -   10%)
                 MedTerm       10.26      (2.0%)       10.14      (1.4%)   
-1.1% (  -4% -    2%)
           OrNotHighHigh        7.32      (1.7%)        7.24      (1.6%)   
-1.0% (  -4% -    2%)
         MedSloppyPhrase        2.47      (6.1%)        2.45      (5.9%)   
-1.0% ( -12% -   11%)
                HighTerm        6.85      (1.3%)        6.78      (1.8%)   
-1.0% (  -4% -    2%)
               OrHighMed        4.46      (1.6%)        4.42      (2.0%)   
-0.9% (  -4% -    2%)
             LowSpanNear        5.98      (4.0%)        5.92      (3.4%)   
-0.9% (  -8% -    6%)
            HighSpanNear        2.54      (2.6%)        2.53      (2.9%)   
-0.6% (  -5% -    5%)
              HighPhrase        2.18      (5.9%)        2.16      (6.1%)   
-0.5% ( -11% -   12%)
                 Respell       41.46      (3.6%)       41.32      (3.1%)   
-0.3% (  -6% -    6%)
               OrHighLow        2.19      (1.6%)        2.19      (1.6%)   
-0.3% (  -3% -    2%)
                Wildcard        3.65      (1.8%)        3.64      (1.5%)   
-0.2% (  -3% -    3%)
           OrHighNotHigh        3.78      (1.7%)        3.77      (1.5%)   
-0.2% (  -3% -    3%)
              OrHighHigh        1.65      (2.0%)        1.64      (1.5%)   
-0.2% (  -3% -    3%)
            OrHighNotLow        3.29      (1.5%)        3.28      (1.6%)   
-0.2% (  -3% -    3%)
                  IntNRQ        1.18      (1.8%)        1.18      (1.4%)   
-0.1% (  -3% -    3%)
{noformat}

> Taxonomy tree traversing improvement
> ------------------------------------
>
>                 Key: LUCENE-5316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5316
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Gilad Barkai
>            Priority: Minor
>         Attachments: LUCENE-5316.patch, LUCENE-5316.patch, LUCENE-5316.patch
>
>
> The taxonomy traversing is done today utilizing the 
> {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays 
> which hold for each ordinal it's (array #1) youngest child and (array #2) 
> older sibling.
> This is a compact way of holding the tree information in memory, but it's not 
> perfect:
> * Large (8 bytes per ordinal in memory)
> * Exposes internal implementation
> * Utilizing these arrays for tree traversing is not straight forward
> * Lose reference locality while traversing (the array is accessed in 
> increasing only entries, but they may be distant from one another)
> * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
> This issue is about making the traversing more easy, the code more readable, 
> and open it for future improvements (i.e memory footprint and NRT cost) - 
> without changing any of the internals. 
> A later issue(s?) could be opened to address the gaps once this one is done.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5316) Taxonomy tree traversing improvement

Reply via email to