[
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815125#comment-13815125
]
Michael McCandless commented on LUCENE-5316:
--------------------------------------------
Hmm but this is the NO_PARENTS perf:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
AndHighLow 41.14 (2.5%) 14.32 (4.2%)
-65.2% ( -70% - -59%)
MedPhrase 30.42 (2.7%) 12.74 (4.5%)
-58.1% ( -63% - -52%)
LowTerm 28.27 (1.6%) 12.33 (4.6%)
-56.4% ( -61% - -50%)
OrNotHighLow 24.15 (2.6%) 11.47 (4.7%)
-52.5% ( -58% - -46%)
Fuzzy1 21.93 (1.6%) 10.94 (4.6%)
-50.1% ( -55% - -44%)
Fuzzy2 18.12 (1.7%) 9.89 (4.5%)
-45.4% ( -50% - -39%)
LowSloppyPhrase 17.97 (1.8%) 9.84 (4.7%)
-45.2% ( -50% - -39%)
OrNotHighMed 14.90 (2.7%) 8.89 (4.9%)
-40.4% ( -46% - -33%)
MedSpanNear 13.41 (2.4%) 8.25 (4.3%)
-38.5% ( -44% - -32%)
AndHighMed 12.41 (1.5%) 7.90 (4.4%)
-36.4% ( -41% - -30%)
AndHighHigh 10.72 (1.1%) 7.22 (4.2%)
-32.7% ( -37% - -27%)
Prefix3 10.22 (1.8%) 6.93 (4.1%)
-32.2% ( -37% - -26%)
MedTerm 9.86 (2.2%) 6.76 (4.1%)
-31.4% ( -36% - -25%)
LowPhrase 7.28 (6.0%) 5.43 (3.9%)
-25.4% ( -33% - -16%)
OrNotHighHigh 7.38 (2.1%) 5.53 (3.8%)
-25.2% ( -30% - -19%)
HighTerm 6.85 (2.2%) 5.20 (3.6%)
-24.1% ( -29% - -18%)
LowSpanNear 5.70 (3.2%) 4.48 (4.1%)
-21.4% ( -27% - -14%)
OrHighNotMed 5.69 (2.2%) 4.52 (3.2%)
-20.6% ( -25% - -15%)
OrHighMed 4.57 (2.6%) 3.78 (2.6%)
-17.2% ( -21% - -12%)
OrHighNotHigh 3.89 (2.5%) 3.29 (2.7%)
-15.3% ( -20% - -10%)
Wildcard 3.76 (2.7%) 3.20 (2.4%)
-14.9% ( -19% - -10%)
OrHighNotLow 3.36 (2.1%) 2.94 (2.1%)
-12.4% ( -16% - -8%)
HighSloppyPhrase 2.51 (6.6%) 2.23 (5.9%)
-11.1% ( -22% - 1%)
HighSpanNear 2.58 (3.5%) 2.29 (2.8%)
-11.0% ( -16% - -4%)
MedSloppyPhrase 2.43 (5.2%) 2.19 (4.8%)
-9.8% ( -18% - 0%)
HighPhrase 2.20 (6.8%) 2.00 (4.8%)
-9.2% ( -19% - 2%)
OrHighLow 2.29 (2.3%) 2.09 (1.9%)
-8.8% ( -12% - -4%)
OrHighHigh 1.72 (2.6%) 1.61 (1.6%)
-6.2% ( -10% - -2%)
IntNRQ 1.25 (2.7%) 1.19 (1.1%)
-4.5% ( -8% - 0%)
Respell 40.60 (2.8%) 39.77 (2.8%)
-2.0% ( -7% - 3%)
{noformat}
> Taxonomy tree traversing improvement
> ------------------------------------
>
> Key: LUCENE-5316
> URL: https://issues.apache.org/jira/browse/LUCENE-5316
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Gilad Barkai
> Priority: Minor
> Attachments: LUCENE-5316.patch, LUCENE-5316.patch, LUCENE-5316.patch
>
>
> The taxonomy traversing is done today utilizing the
> {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays
> which hold for each ordinal it's (array #1) youngest child and (array #2)
> older sibling.
> This is a compact way of holding the tree information in memory, but it's not
> perfect:
> * Large (8 bytes per ordinal in memory)
> * Exposes internal implementation
> * Utilizing these arrays for tree traversing is not straight forward
> * Lose reference locality while traversing (the array is accessed in
> increasing only entries, but they may be distant from one another)
> * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
> This issue is about making the traversing more easy, the code more readable,
> and open it for future improvements (i.e memory footprint and NRT cost) -
> without changing any of the internals.
> A later issue(s?) could be opened to address the gaps once this one is done.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]