[
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818469#comment-13818469
]
Michael McCandless commented on LUCENE-5316:
--------------------------------------------
I re-ran ALL_BUT_DIM and NO_PARENTS on the last patch:
ALL_BUT_DIM:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
LowSloppyPhrase 195.79 (6.4%) 160.76 (6.5%)
-17.9% ( -28% - -5%)
MedSpanNear 189.11 (6.1%) 155.88 (6.5%)
-17.6% ( -28% - -5%)
AndHighLow 171.05 (5.4%) 142.46 (5.9%)
-16.7% ( -26% - -5%)
HighPhrase 165.56 (5.6%) 140.32 (6.0%)
-15.2% ( -25% - -3%)
HighSloppyPhrase 135.86 (4.7%) 117.90 (5.3%)
-13.2% ( -22% - -3%)
HighSpanNear 98.69 (4.1%) 88.28 (4.5%)
-10.5% ( -18% - -2%)
MedPhrase 89.68 (4.3%) 81.23 (3.7%)
-9.4% ( -16% - -1%)
OrNotHighLow 93.45 (5.5%) 85.07 (4.9%)
-9.0% ( -18% - 1%)
LowTerm 87.06 (3.4%) 79.50 (3.8%)
-8.7% ( -15% - -1%)
Fuzzy1 63.87 (2.5%) 59.39 (2.9%)
-7.0% ( -12% - -1%)
AndHighMed 53.60 (1.9%) 50.49 (2.6%)
-5.8% ( -10% - -1%)
OrHighLow 54.32 (2.2%) 51.18 (2.4%)
-5.8% ( -10% - -1%)
OrNotHighHigh 62.71 (5.5%) 59.11 (5.0%)
-5.7% ( -15% - 5%)
OrNotHighMed 47.72 (3.4%) 45.35 (3.1%)
-5.0% ( -11% - 1%)
Fuzzy2 48.40 (2.2%) 46.07 (2.4%)
-4.8% ( -9% - 0%)
AndHighHigh 31.48 (1.6%) 30.33 (1.5%)
-3.7% ( -6% - 0%)
MedTerm 35.33 (2.0%) 34.06 (1.9%)
-3.6% ( -7% - 0%)
MedSloppyPhrase 17.17 (4.4%) 16.67 (4.3%)
-2.9% ( -11% - 6%)
Prefix3 27.73 (1.6%) 26.93 (1.2%)
-2.9% ( -5% - 0%)
OrHighNotMed 24.31 (2.4%) 23.79 (1.1%)
-2.1% ( -5% - 1%)
LowPhrase 14.56 (4.2%) 14.28 (4.0%)
-1.9% ( -9% - 6%)
LowSpanNear 11.25 (2.4%) 11.04 (1.7%)
-1.9% ( -5% - 2%)
OrHighHigh 17.63 (1.6%) 17.38 (1.1%)
-1.4% ( -4% - 1%)
OrHighNotLow 18.97 (1.8%) 18.69 (0.9%)
-1.4% ( -4% - 1%)
Wildcard 13.21 (1.4%) 13.03 (0.9%)
-1.4% ( -3% - 0%)
HighTerm 16.34 (1.8%) 16.14 (1.9%)
-1.3% ( -4% - 2%)
OrHighMed 18.11 (1.6%) 17.93 (1.4%)
-1.0% ( -3% - 2%)
Respell 89.31 (2.8%) 88.78 (2.2%)
-0.6% ( -5% - 4%)
OrHighNotHigh 9.09 (2.0%) 9.08 (1.4%)
-0.1% ( -3% - 3%)
IntNRQ 4.87 (1.2%) 4.90 (1.2%)
0.7% ( -1% - 3%)
{noformat}
NO_PARENTS:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
LowSloppyPhrase 98.63 (4.7%) 28.73 (2.9%)
-70.9% ( -74% - -66%)
MedSpanNear 97.31 (4.7%) 28.54 (2.9%)
-70.7% ( -74% - -66%)
AndHighLow 91.63 (3.9%) 28.04 (2.9%)
-69.4% ( -73% - -65%)
HighPhrase 90.81 (3.6%) 27.94 (2.9%)
-69.2% ( -73% - -65%)
HighSloppyPhrase 80.24 (3.2%) 26.90 (3.1%)
-66.5% ( -70% - -62%)
HighSpanNear 65.93 (2.7%) 24.97 (3.3%)
-62.1% ( -66% - -57%)
OrNotHighLow 64.00 (3.3%) 24.74 (3.2%)
-61.3% ( -65% - -56%)
MedPhrase 62.06 (4.1%) 24.52 (3.3%)
-60.5% ( -65% - -55%)
LowTerm 61.33 (2.6%) 24.40 (3.3%)
-60.2% ( -64% - -55%)
OrNotHighHigh 48.27 (2.8%) 21.97 (3.4%)
-54.5% ( -58% - -49%)
Fuzzy1 47.61 (2.2%) 21.90 (3.5%)
-54.0% ( -58% - -49%)
OrHighLow 43.63 (2.6%) 21.07 (3.4%)
-51.7% ( -56% - -46%)
AndHighMed 42.86 (2.6%) 20.75 (3.4%)
-51.6% ( -56% - -46%)
OrNotHighMed 39.23 (2.0%) 19.93 (3.3%)
-49.2% ( -53% - -44%)
Fuzzy2 38.49 (2.3%) 19.76 (3.3%)
-48.6% ( -53% - -44%)
MedTerm 31.48 (2.6%) 17.82 (3.5%)
-43.4% ( -48% - -38%)
AndHighHigh 27.49 (1.9%) 16.39 (3.3%)
-40.4% ( -44% - -35%)
Prefix3 25.17 (2.6%) 15.71 (3.3%)
-37.6% ( -42% - -32%)
OrHighNotMed 22.44 (2.0%) 14.56 (3.0%)
-35.1% ( -39% - -30%)
OrHighNotLow 18.01 (1.7%) 12.66 (2.8%)
-29.7% ( -33% - -25%)
OrHighMed 17.37 (2.1%) 12.33 (2.8%)
-29.0% ( -33% - -24%)
OrHighHigh 17.02 (2.4%) 12.15 (2.8%)
-28.6% ( -33% - -23%)
MedSloppyPhrase 15.76 (4.5%) 11.26 (3.8%)
-28.6% ( -35% - -21%)
HighTerm 15.80 (2.4%) 11.62 (2.9%)
-26.5% ( -30% - -21%)
LowPhrase 13.51 (4.5%) 10.19 (3.0%)
-24.6% ( -30% - -17%)
Wildcard 12.90 (1.5%) 10.04 (2.3%)
-22.1% ( -25% - -18%)
LowSpanNear 10.56 (2.1%) 8.40 (2.5%)
-20.4% ( -24% - -16%)
OrHighNotHigh 9.39 (1.5%) 7.84 (2.1%)
-16.5% ( -19% - -13%)
IntNRQ 5.15 (1.9%) 4.81 (1.4%)
-6.7% ( -9% - -3%)
Respell 84.97 (2.8%) 88.45 (3.3%)
4.1% ( -1% - 10%)
{noformat}
I still see some queries coming back w/ all 0 facets ... not sure why.
> Taxonomy tree traversing improvement
> ------------------------------------
>
> Key: LUCENE-5316
> URL: https://issues.apache.org/jira/browse/LUCENE-5316
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Gilad Barkai
> Priority: Minor
> Attachments: LUCENE-5316.patch, LUCENE-5316.patch, LUCENE-5316.patch
>
>
> The taxonomy traversing is done today utilizing the
> {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays
> which hold for each ordinal it's (array #1) youngest child and (array #2)
> older sibling.
> This is a compact way of holding the tree information in memory, but it's not
> perfect:
> * Large (8 bytes per ordinal in memory)
> * Exposes internal implementation
> * Utilizing these arrays for tree traversing is not straight forward
> * Lose reference locality while traversing (the array is accessed in
> increasing only entries, but they may be distant from one another)
> * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
> This issue is about making the traversing more easy, the code more readable,
> and open it for future improvements (i.e memory footprint and NRT cost) -
> without changing any of the internals.
> A later issue(s?) could be opened to address the gaps once this one is done.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]