Re: [jira] Created: (LUCENE-2886) Adaptive Frame Of Reference

Renaud Delbru Tue, 25 Jan 2011 04:27:52 -0800

-- sorry, resending it as I don't know what happens to the layout of theprevious one


Hi Paul,

This is a good question. The two methods, i.e., VSE and AFOR, are verysimilar. The two methods can be considered as an extension of FOR tomake it less sensitive to outliers by adapting the encoding to the valuedistribution. To achieve this, the two methods are encoding a list ofvalues by- partitioning it into "frames" (or sequence of consecutive integers) ofvariable lengths,- encoding each frame using a different "bit frame" (the minimum numberof bits required to encode any integer in the frame, and still be ableto distinguish them)

- relying on algorithms to automatically find a good list partitioning.

Apart from the minor differences in the implementation design (that Iwill discuss later), the main difference is that VSE is optimised forachieving a high compression rate and a fast decompression butdisregards the efficiency of compression, while AFOR is optimised forachieving a high compression rate, a fast decompression but also a fastcompression speed. VSE is using a Dynamic Programming method to find the*optimal partitioning* of a list (optimal in term of compression rate).While this approach provides a higher compression rate than the oneproposed in AFOR, the complexity of such a partitioning algorithm is O(n* k), with the term n being the number of values and the term k the sizeof the larger frame, which might greatly impact the compressionperformance. In AFOR, we use instead a local optimisation algorithm thatis less effective in term of compression rate but faster to compute.


In term of implementation details, there is a few differences.

1) VSE allows frames of length 1, 2, 4, 6, 8, 12, 16 and 32. The currentimplementation of AFOR restrict the length of a frame to be a multipleof 8 to to be aligned with the start and end of a byte boundary (andalso to minimise the number of loop-unrolled highly-optimised routines).More precisely, AFOR-2 use three frame lengths: 8, 16 and 32.2) To allow the *optimal partitioning* of a list, the originalimplementation of VSE needs to operate on the full list. On thecontrary, AFOR has been developed to operate on small subsets of thelist, so that AFOR can be applied during incremental construction of thecompressed list (it does not require the full list, but works on smallblock of 32 or more integers). However, we can think of applying VSE onsmall subset, as in AFOR. In this case, VSE does not compute the optimalpartition of a list, but only the optimal partition of the subset of thelist.

VSE and AFOR encodes a frame in a similar way: first, a header (1 byte)which provides the bit frame and the frames length, then the encoded frame.

So, as you can see, in essence, the two models are very similar. For thebackground, I know well Fabrizio Silvestri (co-author of VSE), and hewas my PhD thesis examiner (the AFOR compression scheme is a chapter ofmy thesis). The funny thing is that we come up with these two models atthe same time, this summer, without knowing we were working on somethingsimilar ;o). However, he was more lucky than I am to publish hisfindings before me.


I hope this answers to your question.
Feel free to ask if you have any other questions,
Regards,
--
Renaud Delbru

On 25/01/11 12:24, Renaud Delbru wrote:

Hi Paul,
This is a good question. The two methods, i.e., VSE and AFOR, are verysimilar. The two methods can be considered as an extension of FOR tomake it less sensitive to outliers by adapting the encoding to thevalue distribution. To achieve this, the two methods are encoding alist of values by- partitioning it into "frames" (or sequence of consecutive integers)of variable lengths,- encoding each frame using a different "bit frame" (the minimumnumber of bits required to encode any integer in the frame, and stillbe able to distinguish them)
- relying on algorithms to automatically find a good list partitioning.
Apart from the minor differences in the implementation design (that Iwill discuss later), the main difference is that VSE is optimised forachieving a high compression rate and a fast decompression butdisregards the efficiency of compression, while AFOR is optimised forachieving a high compression rate, a fast decompression but also afast compression speed. VSE is using a Dynamic Programming method tofind the *optimal partitioning* of a list (optimal in term ofcompression rate). While this approach provides a higher compressionrate than the one proposed in AFOR, the complexity of such apartitioning algorithm is O(n * k), with the term n being the numberof values and the term k the size of the larger frame, which mightgreatly impact the compression performance. In AFOR, we use instead alocal optimisation algorithm that is less effective in term ofcompression rate but faster to compute.
In term of implementation details, there is a few differences.
1) VSE allows frames of length 1, 2, 4, 6, 8, 12, 16 and 32. Thecurrent implementation of AFOR restrict the length of a frame to be amultiple of 8 to to be aligned with the start and end of a byteboundary (and also to minimise the number of loop-unrolledhighly-optimised routines). More precisely, AFOR-2 use three framelengths: 8, 16 and 32.2) To allow the *optimal partitioning* of a list, the originalimplementation of VSE needs to operate on the full list. On thecontrary, AFOR has been developed to operate on small subsets of thelist, so that AFOR can be applied during incremental construction ofthe compressed list (it does not require the full list, but works onsmall block of 32 or more integers). However, we can think of applyingVSE on small subset, as in AFOR. In this case, VSE does not computethe optimal partition of a list, but only the optimal partition of thesubset of the list.
VSE and AFOR encodes a frame in a similar way: first, a header (1byte) which provides the bit frame and the frames length, then theencoded frame.
So, as you can see, in essence, the two models are very similar. Forthe background, I know well Fabrizio Silvestri (co-author of VSE), andhe was my PhD thesis examiner (the AFOR compression scheme is achapter of my thesis). The funny thing is that we come up with thesetwo models at the same time, this summer, without knowing we wereworking on something similar ;o). However, he was more lucky than I amto publish his findings before me.
I hope this answers to your question.
Feel free to ask if you have any other questions,
Regards,
--
Renaud Delbru



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] Created: (LUCENE-2886) Adaptive Frame Of Reference

Reply via email to