[ https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558819#comment-13558819 ]
Michael McCandless commented on LUCENE-4600: -------------------------------------------- base = ALL_PARENTS, comp = NO_PARENTS: {noformat} All facet dims: {noformat} Task QPS base StdDev QPS comp StdDev Pct diff MedSpanNear 125.77 (2.0%) 79.31 (0.8%) -36.9% ( -38% - -34%) LowSpanNear 124.86 (2.7%) 79.23 (0.5%) -36.5% ( -38% - -34%) HighSpanNear 124.23 (2.3%) 79.44 (0.8%) -36.1% ( -38% - -33%) AndHighLow 107.24 (1.4%) 72.70 (0.7%) -32.2% ( -33% - -30%) MedPhrase 55.98 (0.6%) 44.89 (1.4%) -19.8% ( -21% - -17%) AndHighMed 52.06 (0.7%) 43.20 (0.0%) -17.0% ( -17% - -16%) Fuzzy2 35.71 (0.6%) 30.42 (1.6%) -14.8% ( -16% - -12%) LowPhrase 17.27 (0.3%) 15.21 (3.2%) -11.9% ( -15% - -8%) HighPhrase 15.20 (6.2%) 13.50 (4.7%) -11.2% ( -20% - 0%) LowTerm 41.68 (0.4%) 37.49 (0.4%) -10.1% ( -10% - -9%) LowSloppyPhrase 17.31 (2.9%) 15.75 (0.9%) -9.0% ( -12% - -5%) Fuzzy1 28.11 (0.3%) 25.63 (0.0%) -8.8% ( -9% - -8%) MedSloppyPhrase 18.42 (1.5%) 17.25 (0.1%) -6.3% ( -7% - -4%) Respell 56.32 (0.3%) 54.41 (2.2%) -3.4% ( -5% - 0%) HighSloppyPhrase 0.83 (6.8%) 0.81 (1.0%) -2.3% ( -9% - 5%) Wildcard 63.43 (1.9%) 61.96 (0.3%) -2.3% ( -4% - 0%) Prefix3 45.60 (0.5%) 45.70 (0.7%) 0.2% ( -1% - 1%) IntNRQ 17.54 (0.6%) 17.60 (1.4%) 0.3% ( -1% - 2%) PKLookup 205.89 (0.5%) 210.73 (0.7%) 2.4% ( 1% - 3%) AndHighHigh 11.89 (0.2%) 12.48 (0.3%) 5.0% ( 4% - 5%) HighTerm 7.00 (0.2%) 8.09 (0.1%) 15.6% ( 15% - 16%) OrHighHigh 3.77 (0.6%) 4.36 (0.3%) 15.6% ( 14% - 16%) OrHighLow 6.65 (0.1%) 7.69 (1.5%) 15.6% ( 14% - 17%) OrHighMed 6.61 (0.4%) 7.66 (0.2%) 15.8% ( 15% - 16%) MedTerm 18.86 (0.4%) 22.13 (0.4%) 17.3% ( 16% - 18%) {noformat} I think because this test has 2.5M ords ... the cost of "rolling up" in the end is non-trivial ... > Explore facets aggregation during documents collection > ------------------------------------------------------ > > Key: LUCENE-4600 > URL: https://issues.apache.org/jira/browse/LUCENE-4600 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Michael McCandless > Assignee: Shai Erera > Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, > LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, > LUCENE-4600.patch, LUCENE-4600.patch > > > Today the facet module simply gathers all hits (as a bitset, optionally with > a float[] to hold scores as well, if you will aggregate them) during > collection, and then at the end when you call getFacetsResults(), it makes a > 2nd pass over all those hits doing the actual aggregation. > We should investigate just aggregating as we collect instead, so we don't > have to tie up transient RAM (fairly small for the bit set but possibly big > for the float[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org