[
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558853#comment-13558853
]
Michael McCandless commented on LUCENE-4600:
--------------------------------------------
The performance depends heavily on how many ords your taxo index has ... my
last test was ~2.5M ords, but when I build an index leaving out the two
dimensions (categories, username) with the most ords, leaving 4703 unique ords,
the numbers are much better:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
Prefix3 161.48 (6.1%) 161.99 (7.4%)
0.3% ( -12% - 14%)
PKLookup 235.50 (2.4%) 236.41 (2.1%)
0.4% ( -4% - 5%)
Respell 85.41 (4.4%) 85.92 (4.2%)
0.6% ( -7% - 9%)
AndHighLow 1196.56 (2.1%) 1204.67 (3.4%)
0.7% ( -4% - 6%)
IntNRQ 104.88 (6.7%) 105.77 (9.0%)
0.9% ( -13% - 17%)
Wildcard 215.17 (2.2%) 217.13 (2.6%)
0.9% ( -3% - 5%)
HighSloppyPhrase 3.24 (8.2%) 3.27 (9.2%)
1.0% ( -15% - 19%)
LowSpanNear 42.80 (3.0%) 43.68 (2.8%)
2.1% ( -3% - 8%)
Fuzzy2 84.83 (3.6%) 86.70 (2.8%)
2.2% ( -4% - 8%)
HighSpanNear 11.42 (1.9%) 11.70 (2.3%)
2.4% ( -1% - 6%)
LowPhrase 71.69 (6.8%) 73.91 (6.2%)
3.1% ( -9% - 17%)
Fuzzy1 75.53 (3.4%) 78.81 (2.7%)
4.3% ( -1% - 10%)
HighPhrase 42.58 (11.4%) 44.61 (11.5%)
4.8% ( -16% - 31%)
LowSloppyPhrase 80.22 (2.3%) 84.49 (3.1%)
5.3% ( 0% - 10%)
MedSpanNear 85.37 (1.9%) 91.16 (1.8%)
6.8% ( 3% - 10%)
MedSloppyPhrase 86.55 (2.7%) 92.84 (3.2%)
7.3% ( 1% - 13%)
MedPhrase 145.23 (5.6%) 156.11 (6.1%)
7.5% ( -3% - 20%)
AndHighMed 321.74 (1.2%) 346.20 (1.5%)
7.6% ( 4% - 10%)
AndHighHigh 84.28 (1.6%) 96.80 (1.7%)
14.9% ( 11% - 18%)
OrHighHigh 35.03 (2.9%) 42.53 (4.6%)
21.4% ( 13% - 29%)
OrHighMed 51.75 (3.0%) 63.90 (4.6%)
23.5% ( 15% - 32%)
OrHighLow 50.41 (3.0%) 62.51 (4.7%)
24.0% ( 15% - 32%)
HighTerm 58.55 (3.0%) 74.59 (4.2%)
27.4% ( 19% - 35%)
LowTerm 355.14 (1.6%) 480.44 (2.3%)
35.3% ( 30% - 39%)
MedTerm 206.44 (2.0%) 286.54 (3.1%)
38.8% ( 33% - 44%)
{noformat}
I also separately fixed a silly bug in luceneutil which was causing the *Span*
queries to get 0 hits.
> Explore facets aggregation during documents collection
> ------------------------------------------------------
>
> Key: LUCENE-4600
> URL: https://issues.apache.org/jira/browse/LUCENE-4600
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Michael McCandless
> Assignee: Shai Erera
> Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch,
> LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch,
> LUCENE-4600.patch, LUCENE-4600.patch
>
>
> Today the facet module simply gathers all hits (as a bitset, optionally with
> a float[] to hold scores as well, if you will aggregate them) during
> collection, and then at the end when you call getFacetsResults(), it makes a
> 2nd pass over all those hits doing the actual aggregation.
> We should investigate just aggregating as we collect instead, so we don't
> have to tie up transient RAM (fairly small for the bit set but possibly big
> for the float[]).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]