[ https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558853#comment-13558853 ]
Michael McCandless commented on LUCENE-4600: -------------------------------------------- The performance depends heavily on how many ords your taxo index has ... my last test was ~2.5M ords, but when I build an index leaving out the two dimensions (categories, username) with the most ords, leaving 4703 unique ords, the numbers are much better: {noformat} Task QPS base StdDev QPS comp StdDev Pct diff Prefix3 161.48 (6.1%) 161.99 (7.4%) 0.3% ( -12% - 14%) PKLookup 235.50 (2.4%) 236.41 (2.1%) 0.4% ( -4% - 5%) Respell 85.41 (4.4%) 85.92 (4.2%) 0.6% ( -7% - 9%) AndHighLow 1196.56 (2.1%) 1204.67 (3.4%) 0.7% ( -4% - 6%) IntNRQ 104.88 (6.7%) 105.77 (9.0%) 0.9% ( -13% - 17%) Wildcard 215.17 (2.2%) 217.13 (2.6%) 0.9% ( -3% - 5%) HighSloppyPhrase 3.24 (8.2%) 3.27 (9.2%) 1.0% ( -15% - 19%) LowSpanNear 42.80 (3.0%) 43.68 (2.8%) 2.1% ( -3% - 8%) Fuzzy2 84.83 (3.6%) 86.70 (2.8%) 2.2% ( -4% - 8%) HighSpanNear 11.42 (1.9%) 11.70 (2.3%) 2.4% ( -1% - 6%) LowPhrase 71.69 (6.8%) 73.91 (6.2%) 3.1% ( -9% - 17%) Fuzzy1 75.53 (3.4%) 78.81 (2.7%) 4.3% ( -1% - 10%) HighPhrase 42.58 (11.4%) 44.61 (11.5%) 4.8% ( -16% - 31%) LowSloppyPhrase 80.22 (2.3%) 84.49 (3.1%) 5.3% ( 0% - 10%) MedSpanNear 85.37 (1.9%) 91.16 (1.8%) 6.8% ( 3% - 10%) MedSloppyPhrase 86.55 (2.7%) 92.84 (3.2%) 7.3% ( 1% - 13%) MedPhrase 145.23 (5.6%) 156.11 (6.1%) 7.5% ( -3% - 20%) AndHighMed 321.74 (1.2%) 346.20 (1.5%) 7.6% ( 4% - 10%) AndHighHigh 84.28 (1.6%) 96.80 (1.7%) 14.9% ( 11% - 18%) OrHighHigh 35.03 (2.9%) 42.53 (4.6%) 21.4% ( 13% - 29%) OrHighMed 51.75 (3.0%) 63.90 (4.6%) 23.5% ( 15% - 32%) OrHighLow 50.41 (3.0%) 62.51 (4.7%) 24.0% ( 15% - 32%) HighTerm 58.55 (3.0%) 74.59 (4.2%) 27.4% ( 19% - 35%) LowTerm 355.14 (1.6%) 480.44 (2.3%) 35.3% ( 30% - 39%) MedTerm 206.44 (2.0%) 286.54 (3.1%) 38.8% ( 33% - 44%) {noformat} I also separately fixed a silly bug in luceneutil which was causing the *Span* queries to get 0 hits. > Explore facets aggregation during documents collection > ------------------------------------------------------ > > Key: LUCENE-4600 > URL: https://issues.apache.org/jira/browse/LUCENE-4600 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Michael McCandless > Assignee: Shai Erera > Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, > LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, > LUCENE-4600.patch, LUCENE-4600.patch > > > Today the facet module simply gathers all hits (as a bitset, optionally with > a float[] to hold scores as well, if you will aggregate them) during > collection, and then at the end when you call getFacetsResults(), it makes a > 2nd pass over all those hits doing the actual aggregation. > We should investigate just aggregating as we collect instead, so we don't > have to tie up transient RAM (fairly small for the bit set but possibly big > for the float[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org