[ https://issues.apache.org/jira/browse/LUCENE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818469#comment-13818469 ]
Michael McCandless commented on LUCENE-5316: -------------------------------------------- I re-ran ALL_BUT_DIM and NO_PARENTS on the last patch: ALL_BUT_DIM: {noformat} Task QPS base StdDev QPS comp StdDev Pct diff LowSloppyPhrase 195.79 (6.4%) 160.76 (6.5%) -17.9% ( -28% - -5%) MedSpanNear 189.11 (6.1%) 155.88 (6.5%) -17.6% ( -28% - -5%) AndHighLow 171.05 (5.4%) 142.46 (5.9%) -16.7% ( -26% - -5%) HighPhrase 165.56 (5.6%) 140.32 (6.0%) -15.2% ( -25% - -3%) HighSloppyPhrase 135.86 (4.7%) 117.90 (5.3%) -13.2% ( -22% - -3%) HighSpanNear 98.69 (4.1%) 88.28 (4.5%) -10.5% ( -18% - -2%) MedPhrase 89.68 (4.3%) 81.23 (3.7%) -9.4% ( -16% - -1%) OrNotHighLow 93.45 (5.5%) 85.07 (4.9%) -9.0% ( -18% - 1%) LowTerm 87.06 (3.4%) 79.50 (3.8%) -8.7% ( -15% - -1%) Fuzzy1 63.87 (2.5%) 59.39 (2.9%) -7.0% ( -12% - -1%) AndHighMed 53.60 (1.9%) 50.49 (2.6%) -5.8% ( -10% - -1%) OrHighLow 54.32 (2.2%) 51.18 (2.4%) -5.8% ( -10% - -1%) OrNotHighHigh 62.71 (5.5%) 59.11 (5.0%) -5.7% ( -15% - 5%) OrNotHighMed 47.72 (3.4%) 45.35 (3.1%) -5.0% ( -11% - 1%) Fuzzy2 48.40 (2.2%) 46.07 (2.4%) -4.8% ( -9% - 0%) AndHighHigh 31.48 (1.6%) 30.33 (1.5%) -3.7% ( -6% - 0%) MedTerm 35.33 (2.0%) 34.06 (1.9%) -3.6% ( -7% - 0%) MedSloppyPhrase 17.17 (4.4%) 16.67 (4.3%) -2.9% ( -11% - 6%) Prefix3 27.73 (1.6%) 26.93 (1.2%) -2.9% ( -5% - 0%) OrHighNotMed 24.31 (2.4%) 23.79 (1.1%) -2.1% ( -5% - 1%) LowPhrase 14.56 (4.2%) 14.28 (4.0%) -1.9% ( -9% - 6%) LowSpanNear 11.25 (2.4%) 11.04 (1.7%) -1.9% ( -5% - 2%) OrHighHigh 17.63 (1.6%) 17.38 (1.1%) -1.4% ( -4% - 1%) OrHighNotLow 18.97 (1.8%) 18.69 (0.9%) -1.4% ( -4% - 1%) Wildcard 13.21 (1.4%) 13.03 (0.9%) -1.4% ( -3% - 0%) HighTerm 16.34 (1.8%) 16.14 (1.9%) -1.3% ( -4% - 2%) OrHighMed 18.11 (1.6%) 17.93 (1.4%) -1.0% ( -3% - 2%) Respell 89.31 (2.8%) 88.78 (2.2%) -0.6% ( -5% - 4%) OrHighNotHigh 9.09 (2.0%) 9.08 (1.4%) -0.1% ( -3% - 3%) IntNRQ 4.87 (1.2%) 4.90 (1.2%) 0.7% ( -1% - 3%) {noformat} NO_PARENTS: {noformat} Task QPS base StdDev QPS comp StdDev Pct diff LowSloppyPhrase 98.63 (4.7%) 28.73 (2.9%) -70.9% ( -74% - -66%) MedSpanNear 97.31 (4.7%) 28.54 (2.9%) -70.7% ( -74% - -66%) AndHighLow 91.63 (3.9%) 28.04 (2.9%) -69.4% ( -73% - -65%) HighPhrase 90.81 (3.6%) 27.94 (2.9%) -69.2% ( -73% - -65%) HighSloppyPhrase 80.24 (3.2%) 26.90 (3.1%) -66.5% ( -70% - -62%) HighSpanNear 65.93 (2.7%) 24.97 (3.3%) -62.1% ( -66% - -57%) OrNotHighLow 64.00 (3.3%) 24.74 (3.2%) -61.3% ( -65% - -56%) MedPhrase 62.06 (4.1%) 24.52 (3.3%) -60.5% ( -65% - -55%) LowTerm 61.33 (2.6%) 24.40 (3.3%) -60.2% ( -64% - -55%) OrNotHighHigh 48.27 (2.8%) 21.97 (3.4%) -54.5% ( -58% - -49%) Fuzzy1 47.61 (2.2%) 21.90 (3.5%) -54.0% ( -58% - -49%) OrHighLow 43.63 (2.6%) 21.07 (3.4%) -51.7% ( -56% - -46%) AndHighMed 42.86 (2.6%) 20.75 (3.4%) -51.6% ( -56% - -46%) OrNotHighMed 39.23 (2.0%) 19.93 (3.3%) -49.2% ( -53% - -44%) Fuzzy2 38.49 (2.3%) 19.76 (3.3%) -48.6% ( -53% - -44%) MedTerm 31.48 (2.6%) 17.82 (3.5%) -43.4% ( -48% - -38%) AndHighHigh 27.49 (1.9%) 16.39 (3.3%) -40.4% ( -44% - -35%) Prefix3 25.17 (2.6%) 15.71 (3.3%) -37.6% ( -42% - -32%) OrHighNotMed 22.44 (2.0%) 14.56 (3.0%) -35.1% ( -39% - -30%) OrHighNotLow 18.01 (1.7%) 12.66 (2.8%) -29.7% ( -33% - -25%) OrHighMed 17.37 (2.1%) 12.33 (2.8%) -29.0% ( -33% - -24%) OrHighHigh 17.02 (2.4%) 12.15 (2.8%) -28.6% ( -33% - -23%) MedSloppyPhrase 15.76 (4.5%) 11.26 (3.8%) -28.6% ( -35% - -21%) HighTerm 15.80 (2.4%) 11.62 (2.9%) -26.5% ( -30% - -21%) LowPhrase 13.51 (4.5%) 10.19 (3.0%) -24.6% ( -30% - -17%) Wildcard 12.90 (1.5%) 10.04 (2.3%) -22.1% ( -25% - -18%) LowSpanNear 10.56 (2.1%) 8.40 (2.5%) -20.4% ( -24% - -16%) OrHighNotHigh 9.39 (1.5%) 7.84 (2.1%) -16.5% ( -19% - -13%) IntNRQ 5.15 (1.9%) 4.81 (1.4%) -6.7% ( -9% - -3%) Respell 84.97 (2.8%) 88.45 (3.3%) 4.1% ( -1% - 10%) {noformat} I still see some queries coming back w/ all 0 facets ... not sure why. > Taxonomy tree traversing improvement > ------------------------------------ > > Key: LUCENE-5316 > URL: https://issues.apache.org/jira/browse/LUCENE-5316 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Gilad Barkai > Priority: Minor > Attachments: LUCENE-5316.patch, LUCENE-5316.patch, LUCENE-5316.patch > > > The taxonomy traversing is done today utilizing the > {{ParallelTaxonomyArrays}}. In particular, two taxonomy-size {{int}} arrays > which hold for each ordinal it's (array #1) youngest child and (array #2) > older sibling. > This is a compact way of holding the tree information in memory, but it's not > perfect: > * Large (8 bytes per ordinal in memory) > * Exposes internal implementation > * Utilizing these arrays for tree traversing is not straight forward > * Lose reference locality while traversing (the array is accessed in > increasing only entries, but they may be distant from one another) > * In NRT, a reopen is always (not worst case) done at O(Taxonomy-size) > This issue is about making the traversing more easy, the code more readable, > and open it for future improvements (i.e memory footprint and NRT cost) - > without changing any of the internals. > A later issue(s?) could be opened to address the gaps once this one is done. -- This message was sent by Atlassian JIRA (v6.1#6144) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org