[
https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990988#comment-12990988
]
Michael McCandless commented on LUCENE-2903:
--------------------------------------------
Results on Europarl (each "paragraph" is a doc):
||Query||QPS bulkvint||QPS pfor3||Pct diff||||
|doctimesecnum:[10000 TO 60000]|16.73|12.91|{color:red}-22.8%{color}|
|spanFirst(unit, 5)|5214.47|4143.21|{color:red}-20.5%{color}|
|spanNear([unit, state], 10, true)|869.71|719.62|{color:red}-17.3%{color}|
|"united states"|320.66|266.50|{color:red}-16.9%{color}|
|"united states"~3|212.07|187.75|{color:red}-11.5%{color}|
|u*d|41.09|36.90|{color:red}-10.2%{color}|
|unit~0.7|94.11|85.34|{color:red}-9.3%{color}|
|un*d|68.38|62.09|{color:red}-9.2%{color}|
|+united +states|440.68|406.08|{color:red}-7.8%{color}|
|united states|272.68|255.73|{color:red}-6.2%{color}|
|states|552.36|532.76|{color:red}-3.5%{color}|
|unit~0.5|18.86|18.67|{color:red}-1.0%{color}|
|uni*|47.96|47.65|{color:red}-0.6%{color}|
|united~0.6|23.82|23.69|{color:red}-0.5%{color}|
|unit*|435.99|437.09|{color:green}0.3%{color}|
|doctitle:.*[Uu]nited.*|24.16|24.31|{color:green}0.6%{color}|
|+nebraska +states|35010.33|36809.36|{color:green}5.1%{color}|
|united~0.75|172.36|195.18|{color:green}13.2%{color}|
> Improvement of PForDelta Codec
> ------------------------------
>
> Key: LUCENE-2903
> URL: https://issues.apache.org/jira/browse/LUCENE-2903
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: hao yan
> Attachments: LUCENE_2903.patch, LUCENE_2903.patch
>
>
> There are 3 versions of PForDelta implementations in the Bulk Branch:
> FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2.
> The FrameOfRef is a very basic one which is essentially a binary encoding
> (may result in huge index size).
> The PatchedFrameOfRef is the implmentation based on the original version of
> PForDelta in the literatures.
> The PatchedFrameOfRef2 is my previous implementation which are improved this
> time. (The Codec name is changed to NewPForDelta.).
> In particular, the changes are:
> 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the
> old PForDelta does not support very large exceptions (since
> the Simple16 does not support very large numbers). Now this has been fixed in
> the new LCPForDelta.
> 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other
> two PForDelta implementation in the bulk branch (FrameOfRef and
> PatchedFrameOfRef). The codec's name is "NewPForDelta", as you can see in the
> CodecProvider and PForDeltaFixedIntBlockCodec.
> 3. The performance test results are:
> 1) My "NewPForDelta" codec is faster then FrameOfRef and PatchedFrameOfRef
> for almost all kinds of queries, slightly worse then BulkVInt.
> 2) My "NewPForDelta" codec can result in the smallest index size among all 4
> methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
> 3) All performance test results are achieved by running with "-server"
> instead of "-client"
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]