[
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-3892:
---------------------------------------
Attachment: LUCENE-3892-non-specialized.patch
I created a non-specialized (ie single method to handle all numBits
cases) packed int decoder that decodes directly from byte[]. Baseline
is current BlockPF (FOR w/ specialized decoder), comp is w/ the patch
(using non-specialized decoder):
{noformat}
Task QPS base StdDev base QPS for StdDev for Pct
diff
AndHighMed 69.04 0.77 36.41 1.91 -50% -
-43%
AndHighLow 649.70 17.03 346.71 18.22 -50% -
-42%
LowSpanNear 9.88 0.25 5.53 0.06 -45% -
-42%
MedPhrase 13.25 0.26 7.74 0.07 -43% -
-39%
LowSloppyPhrase 7.59 0.15 4.54 0.13 -43% -
-37%
LowPhrase 22.29 0.31 13.77 0.08 -39% -
-36%
AndHighHigh 23.55 0.12 15.22 0.63 -38% -
-32%
MedSloppyPhrase 6.88 0.12 4.60 0.16 -36% -
-29%
HighSloppyPhrase 1.98 0.07 1.38 0.05 -35% -
-25%
HighTerm 36.11 0.01 25.31 0.87 -32% -
-27%
MedSpanNear 5.02 0.16 3.56 0.03 -31% -
-26%
MedTerm 198.76 0.34 142.92 4.34 -30% -
-25%
HighPhrase 1.83 0.08 1.32 0.02 -31% -
-23%
OrHighLow 27.32 1.10 20.55 0.54 -29% -
-19%
OrHighMed 23.65 0.93 17.83 0.44 -29% -
-19%
OrHighHigh 11.42 0.46 8.72 0.20 -28% -
-18%
HighSpanNear 1.74 0.06 1.38 0.01 -24% -
-17%
IntNRQ 11.61 0.01 9.26 0.02 -20% -
-20%
LowTerm 513.60 2.26 411.60 7.65 -21% -
-18%
Prefix3 82.36 1.05 67.48 1.29 -20% -
-15%
Wildcard 52.63 0.44 43.45 0.81 -19% -
-15%
Fuzzy1 74.74 1.02 70.03 0.80 -8% -
-3%
PKLookup 192.60 3.94 191.87 2.07 -3% -
2%
Fuzzy2 62.50 1.29 62.74 1.10 -3% -
4%
Respell 61.69 1.04 62.79 0.84 -1% -
4%
{noformat}
So... is it's clear all our the specializing does help!
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta,
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch,
> LUCENE-3892-blockFor&hardcode(base).patch,
> LUCENE-3892-blockFor&packedecoder(comp).patch,
> LUCENE-3892-blockFor-with-packedints-decoder.patch,
> LUCENE-3892-blockFor-with-packedints-decoder.patch,
> LUCENE-3892-blockFor-with-packedints.patch, LUCENE-3892-bulkVInt.patch,
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch,
> LUCENE-3892-handle_open_files.patch, LUCENE-3892-non-specialized.patch,
> LUCENE-3892-pfor-compress-iterate-numbits.patch,
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch,
> LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch,
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch,
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]