On Mon, Dec 20, 2010 at 5:49 AM, Li Li <fancye...@gmail.com> wrote: > I think random test is not sufficient. > for normal situation, some branches are not executed. I tested > http://code.google.com/p/integer-array-compress-kit/ with many random > int arrays and it works. But when I use it in real indexing, when in > optimize stage, it corrupted. > Because PForDelta will choose best numFrameBits and some bit such as > 31 is hardly generated by random arrays. So I "force" the encoder to > choose all possible numFrameBits to test all the decode1 ...decode32 > and find some bugs in it.
Good point -- we need to make sure we cover all numFrameBits. And a series of 128 random ints in a row will heavily bias for the high num bits cases. Maybe if we doing a better job w/ the random source to try to target all numBits, w/ varying numbers of exceptions, etc... I'll put a nocommit for this. > what's pfor2? using s9/s16 to encode exception and offset? Yeah I just committed pfor2 this morning on the bulk branch. You can check it out from https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings pfor2 came from the patch attached on https://issues.apache.org/jira/browse/LUCENE-1410 by Hao Yan (thanks!). It uses s16 for the exceptions (though, there's a bug somewhere, because it fails the random test), and it takes a different approachy for encoding exceptions. > In http://code.google.com/p/integer-array-compress-kit/ it's s9 > for NewPForDelta also have many bugs and also need test each branch to > ensure it works well. OK we should have a look at that one still. We need to converge on a good default codec for 4.0. Fortunately it's trivial to take any int block encoder (fixed or variable block) and make a Lucene codec out of it! Mike --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org