[
https://issues.apache.org/jira/browse/LUCENE-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638971#comment-13638971
]
Han Jiang commented on LUCENE-2962:
-----------------------------------
Oh, sorry I didn't made it clear:
All the tests above were already done on wikimediumfull, which is using
WIKI_MEDIUM_TASKS_10MDOCS_FILE.
The crazyMinShouldMatch benefits much from skipper (as is expected from the
crazy avg_len :) ),
and the result is below:
{noformat}
Task QPS base StdDev QPS comp StdDev
Pct diff
10Terms8High10MSM 322.25 (2.5%) 97.87 (0.9%)
-69.6% ( -71% - -67%)
10Terms4High10MSM 449.00 (2.1%) 194.73 (1.2%)
-56.6% ( -58% - -54%)
10Terms6High10MSM 611.10 (2.6%) 327.45 (1.4%)
-46.4% ( -49% - -43%)
10Terms2High10MSM 614.20 (2.6%) 472.07 (1.9%)
-23.1% ( -26% - -19%)
10Terms6High8MSM 61.24 (5.9%) 56.10 (5.6%)
-8.4% ( -18% - 3%)
10Terms4High6MSM 104.63 (4.9%) 100.22 (5.0%)
-4.2% ( -13% - 5%)
10Terms4High2MSM 6.31 (7.8%) 6.12 (8.7%)
-3.0% ( -18% - 14%)
10Terms6High4MSM 1.75 (6.6%) 1.70 (7.3%)
-2.9% ( -15% - 11%)
10Terms2High4MSM 31.74 (6.5%) 30.85 (7.4%)
-2.8% ( -15% - 11%)
10Terms2High2MSM 5.30 (7.0%) 5.16 (8.0%)
-2.6% ( -16% - 13%)
10Terms8High4MSM 0.87 (5.8%) 0.85 (6.3%)
-2.4% ( -13% - 10%)
10Terms0High8MSM 216.98 (4.1%) 211.76 (4.9%)
-2.4% ( -10% - 6%)
10Terms6High2MSM 0.92 (5.3%) 0.90 (6.0%)
-2.3% ( -12% - 9%)
10Terms2High8MSM 115.45 (4.8%) 113.28 (5.1%)
-1.9% ( -11% - 8%)
10Terms4High8MSM 209.93 (4.4%) 206.04 (4.8%)
-1.9% ( -10% - 7%)
10Terms8High8MSM 11.03 (6.8%) 10.85 (8.1%)
-1.7% ( -15% - 14%)
10Terms6High6MSM 9.30 (6.8%) 9.15 (8.0%)
-1.7% ( -15% - 14%)
10Terms0High2MSM 27.76 (6.9%) 27.30 (8.4%)
-1.6% ( -15% - 14%)
10Terms4High3MSM 4.34 (7.0%) 4.27 (8.2%)
-1.6% ( -15% - 14%)
10Terms8High6MSM 3.06 (7.1%) 3.01 (8.3%)
-1.5% ( -15% - 14%)
10Terms8High2MSM 2.33 (6.5%) 2.30 (7.5%)
-1.2% ( -14% - 13%)
10Terms4High4MSM 8.77 (6.6%) 8.67 (8.1%)
-1.2% ( -14% - 14%)
10Terms0High6MSM 77.21 (5.7%) 76.71 (5.9%)
-0.7% ( -11% - 11%)
10Terms2High6MSM 73.82 (5.7%) 73.40 (6.1%)
-0.6% ( -11% - 11%)
10Terms0High4MSM 63.80 (5.9%) 63.64 (6.3%)
-0.2% ( -11% - 12%)
10Terms0High10MSM 595.12 (2.4%) 595.54 (2.4%)
0.1% ( -4% - 5%)
PKLookup 244.34 (3.1%) 259.97 (3.0%)
6.4% ( 0% - 12%)
{noformat}
> Skip data should be inlined into the postings lists
> ---------------------------------------------------
>
> Key: LUCENE-2962
> URL: https://issues.apache.org/jira/browse/LUCENE-2962
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/index
> Reporter: Michael McCandless
> Labels: gsoc2013
> Attachments: proposal.txt
>
>
> Today, we store all skip data as a separate blob at the end of a given term's
> postings (if that term occurs in enough docs to warrant skip data).
> But this adds overhead during decoding -- we have to seek to a different
> place for the initial load, we have to init separate readers, we have to seek
> again while using the lower levels of the skip data, etc. Also, we have to
> fully decode all skip information even if we are not going to use it (eg if I
> only want docIDs, I still must decode position offset and lastPayloadLength).
> If instead we interleaved skip data into the postings file, we could keep it
> local, and "private" to each file that needs skipping. This should make it
> least costly to init and then use the skip data, which'd be a good perf gain
> for eg PhraseQuery, AndQuery.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]