[
https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12995436#comment-12995436
]
hao yan commented on LUCENE-2903:
---------------------------------
Thank both of you! Thanks for testing my codec so quickly, Michael!
RE: One question: it looks like this PFOR impl can only handle up to 28
bit wide ints? Which means... could it could fail on some cases?
Though I suppose you would never see too many of these immense ints in
one block, and so they'd always be encoded as exceptions and so it's
actually safe...?
Hao: This won't fail. In my PFOR impl, I will first checkBigNumbers() to see if
there is any number >= 2^28, if there is, i will force encoding the lower 4
bits using the 128 4-bit slots. Thus, all exceptions left to simple16 are <
2^28, which can definitely be handled. So, there is no failure cases!!! :) .
BTW, my PFOR impl will save more index size than VInt and other PFOR impls.
Thus, if the user case is real-time search which requires loading index from
disk to memory frequently, my PFOR impl may save even more.
> Improvement of PForDelta Codec
> ------------------------------
>
> Key: LUCENE-2903
> URL: https://issues.apache.org/jira/browse/LUCENE-2903
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: hao yan
> Attachments: LUCENE-2903.patch, LUCENE-2903.patch, for_pfor.patch
>
>
> There are 3 versions of PForDelta implementations in the Bulk Branch:
> FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2.
> The FrameOfRef is a very basic one which is essentially a binary encoding
> (may result in huge index size).
> The PatchedFrameOfRef is the implmentation based on the original version of
> PForDelta in the literatures.
> The PatchedFrameOfRef2 is my previous implementation which are improved this
> time. (The Codec name is changed to NewPForDelta.).
> In particular, the changes are:
> 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the
> old PForDelta does not support very large exceptions (since
> the Simple16 does not support very large numbers). Now this has been fixed in
> the new LCPForDelta.
> 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other
> two PForDelta implementation in the bulk branch (FrameOfRef and
> PatchedFrameOfRef). The codec's name is "NewPForDelta", as you can see in the
> CodecProvider and PForDeltaFixedIntBlockCodec.
> 3. The performance test results are:
> 1) My "NewPForDelta" codec is faster then FrameOfRef and PatchedFrameOfRef
> for almost all kinds of queries, slightly worse then BulkVInt.
> 2) My "NewPForDelta" codec can result in the smallest index size among all 4
> methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself)
> 3) All performance test results are achieved by running with "-server"
> instead of "-client"
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]