[
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431324#comment-13431324
]
Adrien Grand commented on LUCENE-3892:
--------------------------------------
I did some changes to the {{BlockPacked}} codec:
- encoding and decoding using int[] instead of long[]
- selection of the format based on a configurable overhead ratio.
The results are encouraging:
{noformat}
Task QPS 3892 StdDev 3892QPS 3892-packedStdDev 3892-packed
Pct diff
PKLookup 256.93 8.89 256.85 7.47 -6% -
6%
OrHighLow 145.14 9.86 145.14 9.35 -12% -
14%
Respell 110.26 1.84 110.27 2.01 -3% -
3%
AndHighHigh 112.97 0.81 113.19 2.17 -2% -
2%
Fuzzy1 102.15 1.47 102.86 3.13 -3% -
5%
OrHighHigh 94.56 6.56 95.43 6.35 -11% -
15%
Fuzzy2 42.49 0.77 42.89 1.43 -4% -
6%
OrHighMed 175.30 11.34 177.42 10.83 -10% -
14%
AndHighLow 1925.02 23.92 1952.57 48.68 -2% -
5%
HighPhrase 8.96 0.41 9.11 0.46 -7% -
11%
Wildcard 189.79 2.13 193.12 1.57 0% -
3%
HighSpanNear 6.47 0.15 6.59 0.25 -4% -
8%
Prefix3 256.67 2.58 262.40 2.84 0% -
4%
LowTerm 1746.52 52.80 1789.54 54.30 -3% -
8%
HighTerm 238.70 13.46 245.63 16.60 -9% -
16%
MedTerm 923.64 38.19 951.18 46.85 -5% -
12%
AndHighMed 364.46 3.65 377.09 10.03 0% -
7%
IntNRQ 56.58 1.02 58.84 0.80 0% -
7%
HighSloppyPhrase 11.73 0.30 12.40 0.62 -2% -
13%
LowSpanNear 29.64 0.96 32.44 0.98 2% -
16%
MedSpanNear 22.96 0.72 25.16 0.85 2% -
16%
MedPhrase 40.99 1.25 45.09 1.24 3% -
16%
LowSloppyPhrase 37.88 0.99 41.98 1.49 4% -
17%
LowPhrase 64.40 2.04 71.84 1.41 5% -
17%
MedSloppyPhrase 42.29 1.16 47.32 1.54 5% -
18%
{noformat}
I hope this will be confirmed on your computers this time .:-)
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta,
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch,
> LUCENE-3892-blockFor&hardcode(base).patch,
> LUCENE-3892-blockFor&packedecoder(comp).patch,
> LUCENE-3892-blockFor-with-packedints-decoder.patch,
> LUCENE-3892-blockFor-with-packedints-decoder.patch,
> LUCENE-3892-blockFor-with-packedints.patch, LUCENE-3892-bulkVInt.patch,
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch,
> LUCENE-3892-handle_open_files.patch,
> LUCENE-3892-pfor-compress-iterate-numbits.patch,
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch,
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch,
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch,
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]