[ 
https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2903:
---------------------------------------

    Attachment: LUCENE-2903.patch

Thanks Hao!  The new patch looks great -- much leaner.

I fixed a few things... new patch attache.  To keep the comparison
fair, I cutover BulkVInt back to Sep (it was Fixed (interleaved)).  I
also impl'd skipBlock in PFor4 (though this method is never called by
Sep).  I cutover PFor4 to var gap terms index.

Finally I added back copyright headers (Simple16.java's had been
stripped but other new sources were missing...).  Also,
we need to eventually remove the @author tags..

One question: it looks like this PFOR impl can only handle up to 28
bit wide ints?  Which means... could it could fail on some cases?
Though I suppose you would never see too many of these immense ints in
one block, and so they'd always be encoded as exceptions and so it's
actually safe...?

Here are the results on Linux, MMapDir, 10M docs, unshuffled:

||Query||QPS BulkVInt||QPS PFor4||Pct diff||||
|"united states"|13.66|11.63|{color:red}-14.9%{color}|
|u*d|12.75|11.55|{color:red}-9.4%{color}|
|un*d|24.71|22.46|{color:red}-9.1%{color}|
|uni*|24.68|22.85|{color:red}-7.4%{color}|
|unit*|41.22|39.25|{color:red}-4.8%{color}|
|+nebraska +states|128.41|123.73|{color:red}-3.6%{color}|
|spanFirst(unit, 5)|263.41|258.27|{color:red}-1.9%{color}|
|+united +states|21.37|21.09|{color:red}-1.3%{color}|
|title:.*[Uu]nited.*|5.70|5.66|{color:red}-0.6%{color}|
|timesecnum:[10000 TO 60000]|15.01|14.96|{color:red}-0.4%{color}|
|unit~0.7|41.78|43.44|{color:green}4.0%{color}|
|"united states"~3|6.48|6.79|{color:green}4.8%{color}|
|unit~0.5|24.61|25.83|{color:green}4.9%{color}|
|spanNear([unit, state], 10, true)|52.34|55.67|{color:green}6.4%{color}|
|united~0.6|11.36|12.18|{color:green}7.1%{color}|
|united~0.75|15.96|17.58|{color:green}10.2%{color}|
|states|53.41|61.03|{color:green}14.3%{color}|
|united states|16.87|20.62|{color:green}22.2%{color}|

Very nice!


> Improvement of PForDelta Codec
> ------------------------------
>
>                 Key: LUCENE-2903
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2903
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: hao yan
>         Attachments: LUCENE-2903.patch, LUCENE-2903.patch
>
>
> There are 3 versions of PForDelta implementations in the Bulk Branch: 
> FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2.
> The FrameOfRef is a very basic one which is essentially a binary encoding 
> (may result in huge index size).
> The PatchedFrameOfRef is the implmentation based on the original version of 
> PForDelta in the literatures.
> The PatchedFrameOfRef2 is my previous implementation which are improved this 
> time. (The Codec name is changed to NewPForDelta.).
> In particular, the changes are:
> 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the 
> old PForDelta does not support very large exceptions (since
> the Simple16 does not support very large numbers). Now this has been fixed in 
> the new LCPForDelta.
> 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other 
> two PForDelta implementation in the bulk branch (FrameOfRef and 
> PatchedFrameOfRef). The codec's name is "NewPForDelta", as you can see in the 
> CodecProvider and PForDeltaFixedIntBlockCodec.
> 3. The performance test results are:
> 1) My "NewPForDelta" codec is faster then FrameOfRef and PatchedFrameOfRef 
> for almost all kinds of queries, slightly worse then BulkVInt.
> 2) My "NewPForDelta" codec can result in the smallest index size among all 4 
> methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
> 3) All performance test results are achieved by running with "-server" 
> instead of "-client"

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to