[ https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830060#comment-13830060 ]
Michael McCandless commented on LUCENE-5316: -------------------------------------------- OK, I ran the same perf tests with the last patch. The "sometimes all 0 facet counts" problem is fixed! NO_PARENTS: {noformat} Task QPS base StdDev QPS comp StdDev Pct diff LowSloppyPhrase 99.22 (4.1%) 35.51 (1.9%) -64.2% ( -67% - -60%) MedSpanNear 96.92 (4.2%) 35.23 (2.0%) -63.6% ( -67% - -59%) AndHighLow 91.99 (3.5%) 34.57 (2.0%) -62.4% ( -65% - -58%) HighPhrase 90.68 (3.8%) 34.35 (2.0%) -62.1% ( -65% - -58%) HighSloppyPhrase 81.34 (2.9%) 32.74 (2.1%) -59.7% ( -62% - -56%) HighSpanNear 65.81 (2.9%) 30.03 (2.2%) -54.4% ( -57% - -50%) OrNotHighLow 63.44 (3.4%) 29.66 (2.0%) -53.3% ( -56% - -49%) MedPhrase 62.66 (3.2%) 29.30 (2.0%) -53.2% ( -56% - -49%) LowTerm 61.47 (3.8%) 29.02 (2.0%) -52.8% ( -56% - -48%) Fuzzy1 47.78 (3.3%) 25.66 (2.3%) -46.3% ( -50% - -42%) OrNotHighHigh 47.59 (3.8%) 25.73 (2.3%) -45.9% ( -50% - -41%) OrHighLow 43.78 (1.9%) 24.41 (2.1%) -44.2% ( -47% - -41%) AndHighMed 42.81 (2.1%) 24.04 (2.0%) -43.9% ( -47% - -40%) OrNotHighMed 38.92 (2.6%) 22.95 (2.0%) -41.0% ( -44% - -37%) Fuzzy2 38.27 (2.6%) 22.86 (2.2%) -40.3% ( -43% - -36%) MedTerm 31.78 (2.5%) 20.14 (2.1%) -36.6% ( -40% - -32%) AndHighHigh 27.50 (1.7%) 18.33 (1.9%) -33.3% ( -36% - -30%) Prefix3 25.26 (1.9%) 17.35 (1.7%) -31.3% ( -34% - -28%) OrHighNotMed 22.27 (1.4%) 16.04 (1.4%) -28.0% ( -30% - -25%) OrHighNotLow 18.01 (1.6%) 13.76 (1.5%) -23.6% ( -26% - -20%) OrHighMed 17.33 (2.1%) 13.26 (1.6%) -23.5% ( -26% - -20%) OrHighHigh 16.84 (1.9%) 13.05 (1.5%) -22.5% ( -25% - -19%) MedSloppyPhrase 15.54 (3.9%) 12.22 (3.4%) -21.4% ( -27% - -14%) HighTerm 15.87 (2.2%) 12.48 (1.7%) -21.3% ( -24% - -17%) LowPhrase 13.78 (1.6%) 11.03 (1.3%) -20.0% ( -22% - -17%) Wildcard 12.93 (1.9%) 10.65 (1.3%) -17.7% ( -20% - -14%) LowSpanNear 10.55 (2.0%) 8.92 (1.7%) -15.5% ( -18% - -12%) OrHighNotHigh 9.29 (1.4%) 8.16 (1.4%) -12.2% ( -14% - -9%) IntNRQ 5.19 (1.3%) 4.92 (1.9%) -5.1% ( -8% - -1%) Respell 85.48 (2.6%) 87.32 (2.6%) 2.2% ( -2% - 7%) {noformat} ALL_BUT_DIM: {noformat} Task QPS base StdDev QPS comp StdDev Pct diff Respell 89.86 (3.0%) 89.27 (2.2%) -0.7% ( -5% - 4%) LowSpanNear 12.01 (2.3%) 11.95 (2.1%) -0.5% ( -4% - 3%) Fuzzy1 88.54 (2.2%) 88.41 (1.5%) -0.2% ( -3% - 3%) Fuzzy2 62.54 (2.1%) 62.50 (2.3%) -0.1% ( -4% - 4%) OrHighNotHigh 10.10 (1.7%) 10.09 (1.9%) -0.1% ( -3% - 3%) OrNotHighHigh 85.35 (5.4%) 85.31 (5.4%) -0.0% ( -10% - 11%) OrHighNotLow 22.11 (1.4%) 22.10 (1.2%) -0.0% ( -2% - 2%) MedSloppyPhrase 18.87 (4.0%) 18.88 (4.7%) 0.1% ( -8% - 9%) HighTerm 18.93 (1.5%) 18.97 (1.7%) 0.2% ( -2% - 3%) OrHighMed 21.19 (1.6%) 21.26 (1.4%) 0.3% ( -2% - 3%) LowPhrase 15.79 (4.4%) 15.85 (4.2%) 0.4% ( -7% - 9%) AndHighHigh 38.40 (1.0%) 38.57 (1.2%) 0.4% ( -1% - 2%) OrHighHigh 20.55 (1.4%) 20.64 (1.5%) 0.4% ( -2% - 3%) OrHighNotMed 29.27 (1.4%) 29.40 (1.2%) 0.4% ( -2% - 3%) AndHighMed 72.26 (1.1%) 72.60 (1.1%) 0.5% ( -1% - 2%) Wildcard 14.92 (1.0%) 14.99 (1.3%) 0.5% ( -1% - 2%) HighSpanNear 159.71 (3.5%) 160.74 (3.7%) 0.6% ( -6% - 8%) IntNRQ 5.15 (1.4%) 5.18 (1.7%) 0.7% ( -2% - 3%) Prefix3 33.93 (1.3%) 34.18 (1.8%) 0.7% ( -2% - 3%) MedTerm 44.36 (1.7%) 44.69 (1.6%) 0.8% ( -2% - 4%) OrNotHighMed 62.66 (2.4%) 63.18 (3.1%) 0.8% ( -4% - 6%) OrHighLow 75.94 (1.4%) 76.65 (1.5%) 0.9% ( -1% - 3%) OrNotHighLow 150.08 (4.7%) 151.62 (5.0%) 1.0% ( -8% - 11%) MedPhrase 138.21 (3.7%) 139.67 (3.6%) 1.1% ( -6% - 8%) LowTerm 140.27 (2.3%) 142.14 (2.6%) 1.3% ( -3% - 6%) HighSloppyPhrase 283.76 (1.3%) 291.51 (1.8%) 2.7% ( 0% - 5%) HighPhrase 455.49 (1.5%) 476.29 (3.1%) 4.6% ( 0% - 9%) MedSpanNear 660.85 (2.0%) 693.91 (2.4%) 5.0% ( 0% - 9%) AndHighLow 482.21 (2.1%) 511.77 (2.5%) 6.1% ( 1% - 11%) LowSloppyPhrase 759.01 (1.9%) 816.01 (2.1%) 7.5% ( 3% - 11%) {noformat} > Taxonomy tree traversing improvement > ------------------------------------ > > Key: LUCENE-5316 > URL: https://issues.apache.org/jira/browse/LUCENE-5316 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Gilad Barkai > Priority: Minor > Attachments: LUCENE-5316.patch, LUCENE-5316.patch, LUCENE-5316.patch, > LUCENE-5316.patch, LUCENE-5316.patch > > > The taxonomy traversing is done today utilizing the > {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays > which hold for each ordinal it's (array #1) youngest child and (array #2) > older sibling. > This is a compact way of holding the tree information in memory, but it's not > perfect: > * Large (8 bytes per ordinal in memory) > * Exposes internal implementation > * Utilizing these arrays for tree traversing is not straight forward > * Lose reference locality while traversing (the array is accessed in > increasing only entries, but they may be distant from one another) > * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size) > This issue is about making the traversing more easy, the code more readable, > and open it for future improvements (i.e memory footprint and NRT cost) - > without changing any of the internals. > A later issue(s?) could be opened to address the gaps once this one is done. -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org