[
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558361#comment-13558361
]
Michael McCandless commented on LUCENE-4600:
--------------------------------------------
Results if I rebuild the index with NO_PARENTS (just to make sure the locality
gains are not due to frequently visiting the parent ords in the count array):
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
Respell 55.59 (3.9%) 54.45 (3.4%)
-2.0% ( -8% - 5%)
IntNRQ 18.34 (7.1%) 18.04 (6.4%)
-1.7% ( -14% - 12%)
AndHighLow 86.87 (0.6%) 86.26 (1.9%)
-0.7% ( -3% - 1%)
MedSpanNear 97.31 (0.9%) 96.63 (1.8%)
-0.7% ( -3% - 1%)
Prefix3 46.40 (5.6%) 46.11 (4.6%)
-0.6% ( -10% - 10%)
LowSpanNear 97.76 (0.9%) 97.28 (1.8%)
-0.5% ( -3% - 2%)
Fuzzy2 31.88 (1.6%) 31.77 (2.7%)
-0.3% ( -4% - 3%)
Wildcard 62.53 (2.9%) 62.34 (2.5%)
-0.3% ( -5% - 5%)
PKLookup 210.69 (1.5%) 210.37 (1.8%)
-0.1% ( -3% - 3%)
HighSpanNear 97.44 (1.4%) 97.35 (1.7%)
-0.1% ( -3% - 3%)
MedPhrase 49.87 (2.4%) 50.18 (2.5%)
0.6% ( -4% - 5%)
HighPhrase 14.32 (8.8%) 14.42 (8.8%)
0.7% ( -15% - 20%)
LowTerm 37.64 (0.5%) 37.90 (1.3%)
0.7% ( -1% - 2%)
AndHighMed 45.23 (0.6%) 45.74 (1.1%)
1.1% ( 0% - 2%)
MedTerm 22.53 (1.0%) 23.00 (1.3%)
2.1% ( 0% - 4%)
LowSloppyPhrase 16.27 (2.5%) 16.65 (5.7%)
2.3% ( -5% - 10%)
Fuzzy1 24.86 (1.7%) 25.87 (1.4%)
4.1% ( 0% - 7%)
HighTerm 7.67 (1.6%) 8.00 (2.4%)
4.3% ( 0% - 8%)
MedSloppyPhrase 16.67 (1.2%) 17.58 (3.1%)
5.5% ( 1% - 9%)
HighSloppyPhrase 0.81 (6.6%) 0.86 (12.8%)
6.9% ( -11% - 28%)
AndHighHigh 11.38 (0.8%) 12.18 (1.2%)
7.1% ( 5% - 9%)
LowPhrase 14.69 (4.7%) 15.82 (5.7%)
7.6% ( -2% - 18%)
OrHighHigh 3.60 (2.3%) 4.32 (3.3%)
20.0% ( 14% - 26%)
OrHighMed 6.20 (1.9%) 7.51 (3.0%)
21.1% ( 15% - 26%)
OrHighLow 6.25 (2.0%) 7.60 (2.4%)
21.7% ( 17% - 26%)
{noformat}
So net/net post is still better! Separately it looks like NO_PARENTS is maybe
~10% faster for the high-cost queries, but slower for the low cost queries ...
which is expected because iterating over 2.2 M ords in the end is a fixed
non-trivial cost ...
> Explore facets aggregation during documents collection
> ------------------------------------------------------
>
> Key: LUCENE-4600
> URL: https://issues.apache.org/jira/browse/LUCENE-4600
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Michael McCandless
> Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch,
> LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch,
> LUCENE-4600.patch
>
>
> Today the facet module simply gathers all hits (as a bitset, optionally with
> a float[] to hold scores as well, if you will aggregate them) during
> collection, and then at the end when you call getFacetsResults(), it makes a
> 2nd pass over all those hits doing the actual aggregation.
> We should investigate just aggregating as we collect instead, so we don't
> have to tie up transient RAM (fairly small for the bit set but possibly big
> for the float[]).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]