[
https://issues.apache.org/jira/browse/LUCENE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576538#comment-13576538
]
Michael McCandless commented on LUCENE-4764:
--------------------------------------------
I decided to test whether the specialization (checking if DV format is
FacetDVFormat and "directly" accessing its address/bytes) helps:
Base = new DV format; comp = new DV format + spec, 9 dims:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
LowSpanNear 7.15 (2.3%) 7.14 (2.0%)
-0.1% ( -4% - 4%)
Respell 45.60 (3.4%) 45.64 (3.3%)
0.1% ( -6% - 7%)
Fuzzy1 24.79 (1.4%) 24.85 (1.3%)
0.3% ( -2% - 2%)
MedSpanNear 18.07 (1.3%) 18.12 (1.6%)
0.3% ( -2% - 3%)
AndHighMed 40.34 (0.8%) 40.47 (0.9%)
0.3% ( -1% - 2%)
MedPhrase 42.25 (2.9%) 42.40 (2.7%)
0.4% ( -5% - 6%)
LowTerm 35.62 (1.1%) 35.76 (1.3%)
0.4% ( -2% - 2%)
AndHighLow 64.53 (1.7%) 64.78 (1.2%)
0.4% ( -2% - 3%)
Fuzzy2 29.06 (1.6%) 29.19 (1.7%)
0.4% ( -2% - 3%)
MedSloppyPhrase 16.88 (1.1%) 16.97 (1.5%)
0.5% ( -2% - 3%)
LowPhrase 15.01 (4.7%) 15.09 (4.8%)
0.5% ( -8% - 10%)
HighSpanNear 2.92 (1.9%) 2.94 (1.7%)
0.7% ( -2% - 4%)
LowSloppyPhrase 15.48 (1.6%) 15.60 (2.1%)
0.7% ( -2% - 4%)
HighPhrase 13.50 (8.8%) 13.60 (8.6%)
0.7% ( -15% - 19%)
MedTerm 22.64 (1.1%) 22.91 (1.2%)
1.2% ( -1% - 3%)
Wildcard 14.29 (0.9%) 14.47 (1.4%)
1.3% ( 0% - 3%)
AndHighHigh 12.40 (0.9%) 12.56 (1.2%)
1.3% ( 0% - 3%)
HighSloppyPhrase 0.82 (4.3%) 0.83 (5.2%)
1.9% ( -7% - 11%)
OrHighMed 7.74 (1.3%) 7.90 (1.4%)
2.0% ( 0% - 4%)
OrHighLow 7.82 (1.4%) 7.98 (1.7%)
2.0% ( 0% - 5%)
HighTerm 8.35 (1.1%) 8.52 (1.5%)
2.1% ( 0% - 4%)
Prefix3 6.48 (1.1%) 6.62 (1.1%)
2.3% ( 0% - 4%)
OrHighHigh 4.58 (1.6%) 4.69 (1.5%)
2.3% ( 0% - 5%)
IntNRQ 2.41 (1.6%) 2.48 (1.5%)
2.7% ( 0% - 5%)
{noformat}
Same, but w/ 7 dims:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
MedSpanNear 28.73 (1.9%) 28.33 (2.7%)
-1.4% ( -5% - 3%)
Respell 45.08 (4.7%) 44.73 (4.0%)
-0.8% ( -9% - 8%)
LowSpanNear 8.38 (2.6%) 8.33 (2.5%)
-0.6% ( -5% - 4%)
Fuzzy2 52.13 (3.5%) 51.85 (3.5%)
-0.5% ( -7% - 6%)
HighSpanNear 3.53 (1.7%) 3.51 (1.9%)
-0.5% ( -3% - 3%)
Fuzzy1 46.42 (2.5%) 46.29 (2.3%)
-0.3% ( -4% - 4%)
MedPhrase 109.24 (5.5%) 109.16 (5.9%)
-0.1% ( -10% - 11%)
HighPhrase 17.28 (10.4%) 17.28 (10.6%)
0.0% ( -19% - 23%)
HighSloppyPhrase 0.92 (8.0%) 0.92 (5.9%)
0.0% ( -12% - 15%)
AndHighHigh 23.28 (1.2%) 23.29 (0.8%)
0.0% ( -1% - 2%)
LowPhrase 21.08 (6.1%) 21.10 (6.6%)
0.1% ( -11% - 13%)
AndHighLow 586.97 (2.5%) 587.46 (2.3%)
0.1% ( -4% - 5%)
LowSloppyPhrase 20.38 (3.1%) 20.41 (2.6%)
0.1% ( -5% - 6%)
LowTerm 110.38 (2.0%) 110.52 (1.4%)
0.1% ( -3% - 3%)
AndHighMed 105.08 (1.0%) 105.31 (0.9%)
0.2% ( -1% - 2%)
Wildcard 27.23 (2.5%) 27.30 (1.8%)
0.3% ( -3% - 4%)
MedSloppyPhrase 25.94 (3.2%) 26.04 (2.1%)
0.4% ( -4% - 5%)
IntNRQ 3.52 (3.6%) 3.54 (2.6%)
0.6% ( -5% - 7%)
HighTerm 19.05 (3.3%) 19.18 (2.7%)
0.6% ( -5% - 6%)
Prefix3 12.89 (3.3%) 12.97 (2.3%)
0.7% ( -4% - 6%)
MedTerm 46.70 (3.0%) 47.06 (2.6%)
0.8% ( -4% - 6%)
OrHighLow 17.06 (4.2%) 17.22 (3.5%)
1.0% ( -6% - 9%)
OrHighMed 16.54 (4.2%) 16.71 (3.6%)
1.0% ( -6% - 9%)
OrHighHigh 8.72 (4.4%) 8.83 (3.7%)
1.2% ( -6% - 9%)
{noformat}
So net/net the specialization doesn't help much here...
> Faster but more RAM/Disk consuming DocValuesFormat for facets
> -------------------------------------------------------------
>
> Key: LUCENE-4764
> URL: https://issues.apache.org/jira/browse/LUCENE-4764
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4764.patch
>
>
> The new default DV format for binary fields has much more
> RAM-efficient encoding of the address for each document ... but it's
> also a bit slower at decode time, which affects facets because we
> decode for every collected docID.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]