[ https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558330#comment-13558330 ]
Michael McCandless commented on LUCENE-4600: -------------------------------------------- I ran the same test, but w/ the full set of query categories: {noformat} Task QPS base StdDev QPS comp StdDev Pct diff AndHighLow 111.98 (1.0%) 110.10 (1.0%) -1.7% ( -3% - 0%) HighSpanNear 128.42 (1.4%) 126.32 (1.1%) -1.6% ( -4% - 0%) LowSpanNear 128.68 (1.4%) 126.59 (1.0%) -1.6% ( -3% - 0%) MedSpanNear 128.18 (1.3%) 126.29 (1.1%) -1.5% ( -3% - 0%) Respell 55.79 (3.9%) 55.35 (4.8%) -0.8% ( -9% - 8%) PKLookup 206.89 (1.1%) 208.08 (1.5%) 0.6% ( -2% - 3%) Fuzzy2 36.21 (1.3%) 36.49 (2.3%) 0.8% ( -2% - 4%) MedPhrase 56.42 (1.4%) 56.94 (1.3%) 0.9% ( -1% - 3%) Wildcard 64.26 (3.8%) 64.88 (2.0%) 1.0% ( -4% - 7%) AndHighMed 51.80 (0.7%) 52.44 (1.2%) 1.2% ( 0% - 3%) IntNRQ 18.49 (4.8%) 18.78 (5.5%) 1.6% ( -8% - 12%) LowTerm 41.15 (0.6%) 41.82 (0.9%) 1.6% ( 0% - 3%) Prefix3 46.94 (4.3%) 47.92 (3.4%) 2.1% ( -5% - 10%) MedTerm 18.47 (0.8%) 18.92 (1.3%) 2.4% ( 0% - 4%) HighPhrase 15.16 (6.2%) 15.77 (4.3%) 4.0% ( -6% - 15%) HighTerm 6.76 (1.2%) 7.07 (1.2%) 4.5% ( 2% - 7%) LowSloppyPhrase 17.14 (3.8%) 17.96 (2.3%) 4.8% ( -1% - 11%) Fuzzy1 27.29 (0.8%) 28.62 (1.4%) 4.9% ( 2% - 7%) MedSloppyPhrase 17.64 (2.4%) 18.90 (1.0%) 7.2% ( 3% - 10%) AndHighHigh 11.11 (0.5%) 11.97 (0.9%) 7.7% ( 6% - 9%) HighSloppyPhrase 0.83 (10.5%) 0.91 (5.9%) 10.1% ( -5% - 29%) LowPhrase 15.83 (3.2%) 17.45 (0.2%) 10.2% ( 6% - 14%) OrHighHigh 3.22 (0.7%) 3.80 (1.5%) 18.1% ( 15% - 20%) OrHighLow 5.68 (0.3%) 6.73 (1.5%) 18.4% ( 16% - 20%) OrHighMed 5.61 (0.5%) 6.66 (1.6%) 18.7% ( 16% - 20%) {noformat} Somehow post-collection is a big gain for the Or queries ... I wonder if somehow we are not getting the out of order scorer (BooleanScorer) w/ CountingCollector ... but looking at both collectors they both return true from acceptsDocsOutOfOrder ... Net/net it seems like we should stick with post collection? The possible downside is memory use of the temporary bit set I guess ... > Explore facets aggregation during documents collection > ------------------------------------------------------ > > Key: LUCENE-4600 > URL: https://issues.apache.org/jira/browse/LUCENE-4600 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Michael McCandless > Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, > LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, > LUCENE-4600.patch > > > Today the facet module simply gathers all hits (as a bitset, optionally with > a float[] to hold scores as well, if you will aggregate them) during > collection, and then at the end when you call getFacetsResults(), it makes a > 2nd pass over all those hits doing the actual aggregation. > We should investigate just aggregating as we collect instead, so we don't > have to tie up transient RAM (fairly small for the bit set but possibly big > for the float[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org