[
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419144#comment-13419144
]
Han Jiang edited comment on LUCENE-3892 at 7/20/12 1:52 PM:
------------------------------------------------------------
An initial try with PackedInts in current trunk version. I replaced all the
int[] buffer with long[] buffer so we can use the API directly. I don't quite
understand the Writer part, so we have to save each long value one by one.
However, it is the Reader part we are concerned:
{noformat}
Task QPS base StdDev base QPS packedStdDev packed Pct
diff
AndHighHigh 29.60 1.56 23.78 0.51 -25% -
-13%
AndHighMed 74.68 3.92 53.15 2.31 -35% -
-21%
Fuzzy1 88.23 1.21 87.13 1.41 -4% -
1%
Fuzzy2 30.09 0.45 29.47 0.47 -5% -
1%
IntNRQ 41.96 3.88 38.16 2.48 -22% -
6%
OrHighHigh 17.56 0.34 15.45 0.15 -14% -
-9%
OrHighMed 34.71 0.76 30.77 0.53 -14% -
-7%
PKLookup 111.00 1.90 110.52 1.59 -3% -
2%
Phrase 9.03 0.23 7.62 0.41 -22% -
-8%
Prefix3 123.56 8.42 110.94 5.43 -20% -
1%
Respell 102.37 1.11 101.79 1.38 -2% -
1%
SloppyPhrase 3.97 0.19 3.52 0.07 -17% -
-4%
SpanNear 8.24 0.18 7.22 0.25 -17% -
-7%
Term 45.16 3.15 37.47 2.32 -27% -
-5%
TermBGroup1M 17.19 1.09 15.86 0.77 -17% -
3%
TermBGroup1M1P 23.47 1.66 20.43 1.16 -23% -
-1%
TermGroup1M 19.20 1.14 17.73 0.84 -16% -
2%
Wildcard 42.75 3.27 36.75 1.96 -24% -
-1%
{noformat}
Maybe we should try PACKED_SINGLE_BLOCK for some special value of numBits,
instead of using PACKED all the time?
Thanks to Adrien, we have a more direct API in LUCENE-4239, I'm trying that now.
was (Author: billy):
An initial try with PackedInts in current trunk version. I replaced all the
int[] buffer with long[] buffer so we can use the API directly. I don't quite
understand the Writer part, so we have to save each long value one by one.
However, it is the Reader part we are concerned:
{format}
Task QPS base StdDev base QPS packedStdDev packed Pct
diff
AndHighHigh 29.60 1.56 23.78 0.51 -25% -
-13%
AndHighMed 74.68 3.92 53.15 2.31 -35% -
-21%
Fuzzy1 88.23 1.21 87.13 1.41 -4% -
1%
Fuzzy2 30.09 0.45 29.47 0.47 -5% -
1%
IntNRQ 41.96 3.88 38.16 2.48 -22% -
6%
OrHighHigh 17.56 0.34 15.45 0.15 -14% -
-9%
OrHighMed 34.71 0.76 30.77 0.53 -14% -
-7%
PKLookup 111.00 1.90 110.52 1.59 -3% -
2%
Phrase 9.03 0.23 7.62 0.41 -22% -
-8%
Prefix3 123.56 8.42 110.94 5.43 -20% -
1%
Respell 102.37 1.11 101.79 1.38 -2% -
1%
SloppyPhrase 3.97 0.19 3.52 0.07 -17% -
-4%
SpanNear 8.24 0.18 7.22 0.25 -17% -
-7%
Term 45.16 3.15 37.47 2.32 -27% -
-5%
TermBGroup1M 17.19 1.09 15.86 0.77 -17% -
3%
TermBGroup1M1P 23.47 1.66 20.43 1.16 -23% -
-1%
TermGroup1M 19.20 1.14 17.73 0.84 -16% -
2%
Wildcard 42.75 3.27 36.75 1.96 -24% -
-1%
{format}
Maybe we should try PACKED_SINGLE_BLOCK for some special value of numBits,
instead of using PACKED all the time?
Thanks to Adrien, we have a more direct API in LUCENE-4239, I'm trying that now.
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta,
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
> Key: LUCENE-3892
> URL: https://issues.apache.org/jira/browse/LUCENE-3892
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael McCandless
> Labels: gsoc2012, lucene-gsoc-12
> Fix For: 4.1
>
> Attachments: LUCENE-3892-BlockTermScorer.patch,
> LUCENE-3892-blockFor-with-packedints.patch,
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch,
> LUCENE-3892-for&pfor-with-javadoc.patch,
> LUCENE-3892-for&pfor-with-javadoc.patch,
> LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-for&pfor.patch,
> LUCENE-3892-handle_open_files.patch,
> LUCENE-3892-pfor-compress-iterate-numbits.patch,
> LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for.patch,
> LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch,
> LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch,
> LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch,
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch,
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]