Re: strange problem of PForDelta decoder

Li Li Wed, 15 Dec 2010 02:55:25 -0800

hi Michael
    you posted a patch here https://issues.apache.org/jira/browse/LUCENE-2723
    I am not familiar with patch. do I need download
LUCENE-2723.patch(there are many patches after this name, do I need
the latest one?) and LUCENE-2723_termscorer.patch and patch them
(patch -p1 <LUCENE-2723.patch)? I just check out the latest source
code from http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene



2010/12/14 Michael McCandless <luc...@mikemccandless.com>:
> Likely you are seeing the startup cost of hotspot compiling the PFOR code?
>
> Ie, does your test first "warmup" the JRE and then do the real test?
>
> I've also found that running -Xbatch produces more consistent results
> from run to run, however, those results may not be as fast as running
> w/o -Xbatch.
>
> Also, it's better to test on actual data (ie a Lucene index's
> postings), and in the full context of searching, because then we get a
> sense of what speedups a real app will see... micro-benching is nearly
> impossible in Java since Hotspot acts very differently vs the "real"
> test.
>
> Mike
>
> On Tue, Dec 14, 2010 at 2:50 AM, Li Li <fancye...@gmail.com> wrote:
>> Hi
>>   I tried to integrate PForDelta into lucene 2.9 but confronted a problem.
>>   I use the implementation in
>> http://code.google.com/p/integer-array-compress-kit/
>>   it implements a basic PForDelta algorithm and an improved one(which
>> called NewPForDelta, but there are many bugs and I have fixed them),
>>   But compare it with VInt and S9, it's speed is very slow when only
>> decode small number of integer arrays.
>>   e.g. when I decoded int[256] arrays which values are randomly
>> generated between 0 and 100, if decode just one array. PFor(or
>> NewPFor) is very slow. when it continuously decodes many arrays such
>> as 10000, it's faster than s9 and vint.
>>   Another strange phenomena is that when call PFor decoder twice, the
>> 2nd times it's faster. Or I call PFor first then NewPFor, the NewPFor
>> is faster. reverse the call sequcence, the 2nd called decoder is
>> faster
>>   e.g.
>>                ct.testNewPFDCodes(list);
>>                ct.testPFor(list);
>>                ct.testVInt(list);
>>                ct.testS9(list);
>>
>> NewPFD decode: 3614705
>> PForDelta decode: 17320
>> VINT decode: 16483
>> S9 decode: 19835
>> when I call by the following sequence
>>
>>                ct.testPFor(list);
>>                ct.testNewPFDCodes(list);
>>                ct.testVInt(list);
>>                ct.testS9(list);
>>
>> PForDelta decode: 3212140
>> NewPFD decode: 19556
>> VINT decode: 16762
>> S9 decode: 16483
>>
>>   My implementation is -- group docIDs and termDocFreqs into block
>> which contains 128 integers. when SegmentTermDocs's next method
>> called(or read readNoTf).it decodes a block and save it to a cache.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: strange problem of PForDelta decoder

Reply via email to