[ https://issues.apache.org/jira/browse/LUCENE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12989754#comment-12989754 ]
hao yan commented on LUCENE-2903: --------------------------------- Hi, Paul. thanks for the suggestions. I just uploaded a new patch which renamed the codec as PatchedFrameOfRef3. I actually have a question to ask. In BulkVInt codec, it writes the compressed byte stream as a chunk of bytes. However, in pfordelta-related codecs, the compressed results are in ints, i have to either write single int with a loop, or first convert int array to byte array and then call out.writeBytes(). Do you know any other smarter way to write an int array to indexOutput? Another try I did is to make PForDelta itself produce byte-wise compressed results. However, from my experimental results, it will slow down pfordelta significantly. Also, i do not think the NIO buffer used in FrameOfRef and PatchedFrameOfRef help since essentially it is like the way that we first convert int array to byte array and then writeBytes(). Do you have any good suggestions? thanks! > Improvement of PForDelta Codec > ------------------------------ > > Key: LUCENE-2903 > URL: https://issues.apache.org/jira/browse/LUCENE-2903 > Project: Lucene - Java > Issue Type: Improvement > Reporter: hao yan > Attachments: LUCENE_2903.patch > > > There are 3 versions of PForDelta implementations in the Bulk Branch: > FrameOfRef, PatchedFrameOfRef, and PatchedFrameOfRef2. > The FrameOfRef is a very basic one which is essentially a binary encoding > (may result in huge index size). > The PatchedFrameOfRef is the implmentation based on the original version of > PForDelta in the literatures. > The PatchedFrameOfRef2 is my previous implementation which are improved this > time. (The Codec name is changed to NewPForDelta.). > In particular, the changes are: > 1. I fixed the bug of my previous version (in Lucene-1410.patch), where the > old PForDelta does not support very large exceptions (since > the Simple16 does not support very large numbers). Now this has been fixed in > the new LCPForDelta. > 2. I changed the PForDeltaFixedIntBlockCodec. Now it is faster than the other > two PForDelta implementation in the bulk branch (FrameOfRef and > PatchedFrameOfRef). The codec's name is "NewPForDelta", as you can see in the > CodecProvider and PForDeltaFixedIntBlockCodec. > 3. The performance test results are: > 1) My "NewPForDelta" codec is faster then FrameOfRef and PatchedFrameOfRef > for almost all kinds of queries, slightly worse then BulkVInt. > 2) My "NewPForDelta" codec can result in the smallest index size among all 4 > methods, including FrameOfRef, PatchedFrameOfRef, and BulkVInt, and itself) > 3) All performance test results are achieved by running with "-server" > instead of "-client" -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org