[ https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558361#comment-13558361 ]
Michael McCandless commented on LUCENE-4600: -------------------------------------------- Results if I rebuild the index with NO_PARENTS (just to make sure the locality gains are not due to frequently visiting the parent ords in the count array): {noformat} Task QPS base StdDev QPS comp StdDev Pct diff Respell 55.59 (3.9%) 54.45 (3.4%) -2.0% ( -8% - 5%) IntNRQ 18.34 (7.1%) 18.04 (6.4%) -1.7% ( -14% - 12%) AndHighLow 86.87 (0.6%) 86.26 (1.9%) -0.7% ( -3% - 1%) MedSpanNear 97.31 (0.9%) 96.63 (1.8%) -0.7% ( -3% - 1%) Prefix3 46.40 (5.6%) 46.11 (4.6%) -0.6% ( -10% - 10%) LowSpanNear 97.76 (0.9%) 97.28 (1.8%) -0.5% ( -3% - 2%) Fuzzy2 31.88 (1.6%) 31.77 (2.7%) -0.3% ( -4% - 3%) Wildcard 62.53 (2.9%) 62.34 (2.5%) -0.3% ( -5% - 5%) PKLookup 210.69 (1.5%) 210.37 (1.8%) -0.1% ( -3% - 3%) HighSpanNear 97.44 (1.4%) 97.35 (1.7%) -0.1% ( -3% - 3%) MedPhrase 49.87 (2.4%) 50.18 (2.5%) 0.6% ( -4% - 5%) HighPhrase 14.32 (8.8%) 14.42 (8.8%) 0.7% ( -15% - 20%) LowTerm 37.64 (0.5%) 37.90 (1.3%) 0.7% ( -1% - 2%) AndHighMed 45.23 (0.6%) 45.74 (1.1%) 1.1% ( 0% - 2%) MedTerm 22.53 (1.0%) 23.00 (1.3%) 2.1% ( 0% - 4%) LowSloppyPhrase 16.27 (2.5%) 16.65 (5.7%) 2.3% ( -5% - 10%) Fuzzy1 24.86 (1.7%) 25.87 (1.4%) 4.1% ( 0% - 7%) HighTerm 7.67 (1.6%) 8.00 (2.4%) 4.3% ( 0% - 8%) MedSloppyPhrase 16.67 (1.2%) 17.58 (3.1%) 5.5% ( 1% - 9%) HighSloppyPhrase 0.81 (6.6%) 0.86 (12.8%) 6.9% ( -11% - 28%) AndHighHigh 11.38 (0.8%) 12.18 (1.2%) 7.1% ( 5% - 9%) LowPhrase 14.69 (4.7%) 15.82 (5.7%) 7.6% ( -2% - 18%) OrHighHigh 3.60 (2.3%) 4.32 (3.3%) 20.0% ( 14% - 26%) OrHighMed 6.20 (1.9%) 7.51 (3.0%) 21.1% ( 15% - 26%) OrHighLow 6.25 (2.0%) 7.60 (2.4%) 21.7% ( 17% - 26%) {noformat} So net/net post is still better! Separately it looks like NO_PARENTS is maybe ~10% faster for the high-cost queries, but slower for the low cost queries ... which is expected because iterating over 2.2 M ords in the end is a fixed non-trivial cost ... > Explore facets aggregation during documents collection > ------------------------------------------------------ > > Key: LUCENE-4600 > URL: https://issues.apache.org/jira/browse/LUCENE-4600 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Michael McCandless > Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, > LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, > LUCENE-4600.patch > > > Today the facet module simply gathers all hits (as a bitset, optionally with > a float[] to hold scores as well, if you will aggregate them) during > collection, and then at the end when you call getFacetsResults(), it makes a > 2nd pass over all those hits doing the actual aggregation. > We should investigate just aggregating as we collect instead, so we don't > have to tie up transient RAM (fairly small for the bit set but possibly big > for the float[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org