[
https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814924#comment-13814924
]
Michael McCandless commented on LUCENE-5316:
--------------------------------------------
Here's ALL_BUT_DIM performance; it looks better! However, I'm not sure why,
but sometimes 1-3 of the queries that ran came back w/ all 0 facet counts.
Maybe a thread safety issue in the quick & dirty patch?
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
AndHighLow 70.50 (3.1%) 63.66 (5.2%)
-9.7% ( -17% - -1%)
MedPhrase 43.43 (2.2%) 40.79 (3.4%)
-6.1% ( -11% - 0%)
LowTerm 38.63 (2.0%) 36.46 (3.4%)
-5.6% ( -10% - 0%)
Fuzzy1 28.41 (1.5%) 27.15 (2.6%)
-4.4% ( -8% - 0%)
OrNotHighLow 31.95 (3.7%) 30.64 (3.9%)
-4.1% ( -11% - 3%)
LowSloppyPhrase 21.67 (1.5%) 20.96 (2.2%)
-3.3% ( -6% - 0%)
Fuzzy2 21.39 (1.7%) 20.71 (1.9%)
-3.2% ( -6% - 0%)
OrNotHighMed 17.30 (2.8%) 16.90 (3.3%)
-2.3% ( -8% - 3%)
Prefix3 10.68 (1.5%) 10.46 (2.2%)
-2.1% ( -5% - 1%)
AndHighMed 13.85 (1.2%) 13.57 (1.4%)
-2.0% ( -4% - 0%)
MedSpanNear 15.19 (2.7%) 14.89 (2.9%)
-2.0% ( -7% - 3%)
AndHighHigh 11.70 (1.0%) 11.51 (1.8%)
-1.6% ( -4% - 1%)
HighSloppyPhrase 2.56 (8.0%) 2.52 (7.9%)
-1.5% ( -16% - 15%)
OrHighNotMed 5.66 (1.4%) 5.58 (1.5%)
-1.4% ( -4% - 1%)
LowPhrase 7.82 (5.7%) 7.72 (5.8%)
-1.2% ( -12% - 10%)
MedTerm 10.26 (2.0%) 10.14 (1.4%)
-1.1% ( -4% - 2%)
OrNotHighHigh 7.32 (1.7%) 7.24 (1.6%)
-1.0% ( -4% - 2%)
MedSloppyPhrase 2.47 (6.1%) 2.45 (5.9%)
-1.0% ( -12% - 11%)
HighTerm 6.85 (1.3%) 6.78 (1.8%)
-1.0% ( -4% - 2%)
OrHighMed 4.46 (1.6%) 4.42 (2.0%)
-0.9% ( -4% - 2%)
LowSpanNear 5.98 (4.0%) 5.92 (3.4%)
-0.9% ( -8% - 6%)
HighSpanNear 2.54 (2.6%) 2.53 (2.9%)
-0.6% ( -5% - 5%)
HighPhrase 2.18 (5.9%) 2.16 (6.1%)
-0.5% ( -11% - 12%)
Respell 41.46 (3.6%) 41.32 (3.1%)
-0.3% ( -6% - 6%)
OrHighLow 2.19 (1.6%) 2.19 (1.6%)
-0.3% ( -3% - 2%)
Wildcard 3.65 (1.8%) 3.64 (1.5%)
-0.2% ( -3% - 3%)
OrHighNotHigh 3.78 (1.7%) 3.77 (1.5%)
-0.2% ( -3% - 3%)
OrHighHigh 1.65 (2.0%) 1.64 (1.5%)
-0.2% ( -3% - 3%)
OrHighNotLow 3.29 (1.5%) 3.28 (1.6%)
-0.2% ( -3% - 3%)
IntNRQ 1.18 (1.8%) 1.18 (1.4%)
-0.1% ( -3% - 3%)
{noformat}
> Taxonomy tree traversing improvement
> ------------------------------------
>
> Key: LUCENE-5316
> URL: https://issues.apache.org/jira/browse/LUCENE-5316
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Gilad Barkai
> Priority: Minor
> Attachments: LUCENE-5316.patch, LUCENE-5316.patch, LUCENE-5316.patch
>
>
> The taxonomy traversing is done today utilizing the
> {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays
> which hold for each ordinal it's (array #1) youngest child and (array #2)
> older sibling.
> This is a compact way of holding the tree information in memory, but it's not
> perfect:
> * Large (8 bytes per ordinal in memory)
> * Exposes internal implementation
> * Utilizing these arrays for tree traversing is not straight forward
> * Lose reference locality while traversing (the array is accessed in
> increasing only entries, but they may be distant from one another)
> * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size)
> This issue is about making the traversing more easy, the code more readable,
> and open it for future improvements (i.e memory footprint and NRT cost) -
> without changing any of the internals.
> A later issue(s?) could be opened to address the gaps once this one is done.
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]