[
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558330#comment-13558330
]
Michael McCandless commented on LUCENE-4600:
--------------------------------------------
I ran the same test, but w/ the full set of query categories:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
AndHighLow 111.98 (1.0%) 110.10 (1.0%)
-1.7% ( -3% - 0%)
HighSpanNear 128.42 (1.4%) 126.32 (1.1%)
-1.6% ( -4% - 0%)
LowSpanNear 128.68 (1.4%) 126.59 (1.0%)
-1.6% ( -3% - 0%)
MedSpanNear 128.18 (1.3%) 126.29 (1.1%)
-1.5% ( -3% - 0%)
Respell 55.79 (3.9%) 55.35 (4.8%)
-0.8% ( -9% - 8%)
PKLookup 206.89 (1.1%) 208.08 (1.5%)
0.6% ( -2% - 3%)
Fuzzy2 36.21 (1.3%) 36.49 (2.3%)
0.8% ( -2% - 4%)
MedPhrase 56.42 (1.4%) 56.94 (1.3%)
0.9% ( -1% - 3%)
Wildcard 64.26 (3.8%) 64.88 (2.0%)
1.0% ( -4% - 7%)
AndHighMed 51.80 (0.7%) 52.44 (1.2%)
1.2% ( 0% - 3%)
IntNRQ 18.49 (4.8%) 18.78 (5.5%)
1.6% ( -8% - 12%)
LowTerm 41.15 (0.6%) 41.82 (0.9%)
1.6% ( 0% - 3%)
Prefix3 46.94 (4.3%) 47.92 (3.4%)
2.1% ( -5% - 10%)
MedTerm 18.47 (0.8%) 18.92 (1.3%)
2.4% ( 0% - 4%)
HighPhrase 15.16 (6.2%) 15.77 (4.3%)
4.0% ( -6% - 15%)
HighTerm 6.76 (1.2%) 7.07 (1.2%)
4.5% ( 2% - 7%)
LowSloppyPhrase 17.14 (3.8%) 17.96 (2.3%)
4.8% ( -1% - 11%)
Fuzzy1 27.29 (0.8%) 28.62 (1.4%)
4.9% ( 2% - 7%)
MedSloppyPhrase 17.64 (2.4%) 18.90 (1.0%)
7.2% ( 3% - 10%)
AndHighHigh 11.11 (0.5%) 11.97 (0.9%)
7.7% ( 6% - 9%)
HighSloppyPhrase 0.83 (10.5%) 0.91 (5.9%)
10.1% ( -5% - 29%)
LowPhrase 15.83 (3.2%) 17.45 (0.2%)
10.2% ( 6% - 14%)
OrHighHigh 3.22 (0.7%) 3.80 (1.5%)
18.1% ( 15% - 20%)
OrHighLow 5.68 (0.3%) 6.73 (1.5%)
18.4% ( 16% - 20%)
OrHighMed 5.61 (0.5%) 6.66 (1.6%)
18.7% ( 16% - 20%)
{noformat}
Somehow post-collection is a big gain for the Or queries ... I wonder if
somehow we are not getting the out of order scorer (BooleanScorer) w/
CountingCollector ... but looking at both collectors they both return true from
acceptsDocsOutOfOrder ...
Net/net it seems like we should stick with post collection? The possible
downside is memory use of the temporary bit set I guess ...
> Explore facets aggregation during documents collection
> ------------------------------------------------------
>
> Key: LUCENE-4600
> URL: https://issues.apache.org/jira/browse/LUCENE-4600
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Michael McCandless
> Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch,
> LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch,
> LUCENE-4600.patch
>
>
> Today the facet module simply gathers all hits (as a bitset, optionally with
> a float[] to hold scores as well, if you will aggregate them) during
> collection, and then at the end when you call getFacetsResults(), it makes a
> 2nd pass over all those hits doing the actual aggregation.
> We should investigate just aggregating as we collect instead, so we don't
> have to tie up transient RAM (fairly small for the bit set but possibly big
> for the float[]).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]