On Mon, Dec 20, 2010 at 5:49 AM, Li Li <fancye...@gmail.com> wrote:
>   I think random test is not sufficient.
>   for normal situation, some branches are not executed. I tested
> http://code.google.com/p/integer-array-compress-kit/ with many random
> int arrays and it works. But when I use it in real indexing, when in
> optimize stage, it corrupted.
>  Because PForDelta will choose best numFrameBits and some bit such as
> 31 is hardly generated by random arrays. So I "force" the encoder to
> choose all possible numFrameBits to test all the decode1 ...decode32
> and find some bugs in it.

Good point -- we need to make sure we cover all numFrameBits.  And a
series of 128 random ints in a row will heavily bias for the high num
bits cases.  Maybe if we doing a better job w/ the random source to
try to target all numBits, w/ varying numbers of exceptions, etc...
I'll put a nocommit for this.

>    what's pfor2? using s9/s16 to encode exception and offset?

Yeah I just committed pfor2 this morning on the bulk branch.  You can
check it out from
https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings

pfor2 came from the patch attached on
https://issues.apache.org/jira/browse/LUCENE-1410 by Hao Yan
(thanks!). It uses s16 for the exceptions (though, there's a bug
somewhere, because it fails the random test), and it takes a different
approachy for encoding exceptions.

>    In http://code.google.com/p/integer-array-compress-kit/ it's s9
> for NewPForDelta also have many bugs and also need test each branch to
> ensure it works well.

OK we should have a look at that one still.  We need to converge on a
good default codec for 4.0.  Fortunately it's trivial to take any int
block encoder (fixed or variable block) and make a Lucene codec out of
it!

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to