[
https://issues.apache.org/jira/browse/LUCENE-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428682#comment-13428682
]
Michael McCandless commented on LUCENE-4283:
--------------------------------------------
I added some new tasks to luceneutil (AndHighLow, OrHighLow), and also
separated tasks for Low/Med/HighTerm (and same for SpanNear/Phrase
queries) so that we can see the impact on the different queries, and
so that we actually test skipping (AndHighLow).
Then I ran a test w/ the 2nd (non-buggy, partial decode, 32
skipInterval patch):
{noformat}
Task QPS base StdDev base QPS comp StdDev comp Pct
diff
AndHighLow 631.54 10.72 101.44 0.70 -84% -
-83%
AndHighMed 44.85 0.94 39.31 0.36 -14% -
-9%
AndHighHigh 18.39 0.27 16.16 0.08 -13% -
-10%
MedSloppyPhrase 12.15 0.14 11.27 0.30 -10% -
-3%
MedSpanNear 9.11 0.10 8.58 0.10 -7% -
-3%
LowSpanNear 5.05 0.03 4.78 0.03 -6% -
-4%
MedPhrase 5.09 0.10 4.81 0.10 -9% -
-1%
LowPhrase 7.80 0.08 7.43 0.07 -6% -
-2%
HighSloppyPhrase 2.13 0.06 2.04 0.06 -10% -
1%
LowSloppyPhrase 5.28 0.11 5.09 0.15 -8% -
1%
HighTerm 22.85 0.11 22.08 0.56 -6% -
0%
LowTerm 526.19 3.56 510.53 9.14 -5% -
0%
MedTerm 138.34 0.51 134.66 3.58 -5% -
0%
HighPhrase 3.55 0.11 3.46 0.11 -8% -
3%
HighSpanNear 1.64 0.00 1.60 0.02 -3% -
0%
Fuzzy1 99.11 3.49 98.91 2.71 -6% -
6%
Fuzzy2 88.31 3.05 88.19 2.32 -6% -
6%
Respell 77.97 1.75 78.24 1.86 -4% -
5%
PKLookup 192.61 1.47 193.47 1.53 -1% -
2%
OrHighMed 25.14 1.23 25.28 1.16 -8% -
10%
OrHighHigh 9.22 0.47 9.30 0.45 -8% -
11%
OrHighLow 37.28 1.79 37.60 1.75 -8% -
10%
Wildcard 67.88 0.33 69.19 2.70 -2% -
6%
Prefix3 25.67 0.35 26.25 1.22 -3% -
8%
IntNRQ 8.85 0.02 9.27 0.98 -6% -
15%
{noformat}
I'm confused why AndHighLow got slower... this patch should have
lowered the per-skip cost.
> Support more frequent skip with Block Postings Format
> -----------------------------------------------------
>
> Key: LUCENE-4283
> URL: https://issues.apache.org/jira/browse/LUCENE-4283
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Han Jiang
> Priority: Minor
> Attachments: LUCENE-4283-buggy.patch, LUCENE-4283-buggy.patch,
> LUCENE-4283-slow.patch, LUCENE-4283-small-interval-fully.patch,
> LUCENE-4283-small-interval-partially.patch
>
>
> This change works on the new bulk branch.
> Currently, our BlockPostingsFormat only supports skipInterval==blockSize.
> Every time the skipper reaches the last level 0 skip point, we'll have to
> decode a whole block to read doc/freq data. Also, a higher level skip list
> will be created only for those df>blockSize^k, which means for most terms,
> skipping will just be a linear scan. If we increase current blockSize for
> better bulk i/o performance, current skip setting will be a bottleneck.
> For ForPF, the encoded block can be easily splitted if we set
> skipInterval=32*k.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]