[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636211#action_12636211
 ] 

Michael McCandless commented on LUCENE-1410:


Awesome, I'll have a look!

The TestPFor.java didn't make it into the patch (it just references a link).

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-1410.patch
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-01 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636292#action_12636292
 ] 

Paul Elschot commented on LUCENE-1410:
--

Sorry about the mistake in the patch, a correction will follow shortly.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-02 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636295#action_12636295
 ] 

Paul Elschot commented on LUCENE-1410:
--

Another detail about the decompression performance as reported above: these are 
all cases without exceptions, so an expected performance degradation for 
patching exceptions is not included in the performance results.
Patching exceptions is provided for in the code, but that code is not yet 
optimized.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-1410b.patch
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-03 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636658#action_12636658
 ] 

Paul Elschot commented on LUCENE-1410:
--

Q: Can you add a method that figures out the right frame size to use
for a given block of ints (so that ~90% of the ints are < N bits)?
A: PFor.getNumFrameBits() does this for a given int[] at offset and size.
Choosing the right size is a dilemma: too small will need too much header 
decoding
and too large may result in using too large number of frame bits, i.e. too 
large compressed size.

Q: I'm using fixed 6-bit frame size. Can you add bigger bit sizes to your pfor 
decompress?
A: It should work for 1<=numFrameBits<=30.

Q: is pfor self punctuating?
A: PFor.bufferByteSize() returns the number of bytes used in the compressed 
buffer, see also the javadocs.
For practical use, the start of each compressed block must be known, either 
from somewhere else,
or from the size of the previously encoded block.
The number of compressed integers is encoded in the header, but I'm not sure 
whether I
made that info available before decompression to allow allocation of an int[] 
that is large
enough to hold the decompressed data.

>: It's really weird how the time gets suddenly faster during readVInt.
A: it's probably the JIT jumping in. That's why I preferred to test in 3 
1-second loops and reporting
performance each second. The 2nd second always has better performance.

It's nice to see that PFor is faster than VInt, I had not tested that yet.
Which block size did you use for PFor? Never mind, I'll take a look at the code 
of TestPFor2.

Btw. after decompressing, the things are ints not vInts.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-1410b.patch, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-03 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636660#action_12636660
 ] 

Paul Elschot commented on LUCENE-1410:
--

Q: I'm using fixed 6-bit frame size. Can you add bigger bit sizes to your pfor 
decompress?
 A: It should work for 1<=numFrameBits<=30.

That answer was too short. There is some fallback code (decodeAnyFrame) that 
will decompress
for any frame or reference. The patch contains unrolled versions of that for up 
to 7 bits iirc.
I'll add higher numbers of bits later, the code is straightforward and it could 
actually be generated.
Btw. on my machine the unrolled 9 bit version is actually a bit slower than the 
loop, I don't know why,
maybe there are too many buffer references (9) in the loop for the jit to cope 
with.



> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-1410b.patch, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636685#action_12636685
 ] 

Michael McCandless commented on LUCENE-1410:


bq. A: PFor.getNumFrameBits() does this for a given int[] at offset and size.

Super, I missed that -- I'll use it.

bq. Btw. after decompressing, the things are ints not vInts.

Oh yeah, I'll fix the prints in my test.

bq. There is some fallback code (decodeAnyFrame) that will decompress for any 
frame or reference

Right I was wanting the unrolled version to see the fastest result we can get 
for pfor, but your next observation is spooky so maybe this isn't really going 
to help our test:

{quote}
Btw. on my machine the unrolled 9 bit version is actually a bit slower than the 
loop, I don't know why,
maybe there are too many buffer references (9) in the loop for the jit to cope 
with.
{quote}
We should look at the asm that's produced?

bq. it's probably the JIT jumping in.

But strangely with -Xbatch I still see the effect.  And even stranger, on 
another machine (Mac OS X) it gets *slower* when the JIT jumps in.  I'm spooked.

bq. Which block size did you use for PFor?

I used 128 but I'll try different sizes.

bq. For practical use, the start of each compressed block must be known, either 
from somewhere else, or from the size of the previously encoded block.

But can I also get the "bytes consumed" when I ask decompress() to
run?

When we really integrate, things like term infos and skipping data
will know how to jump to the start of blocks.  But for raw sequential
scanning, if the header is self-punctuating we would not need to store
the block size between each block.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-1410b.patch, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-03 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636686#action_12636686
 ] 

Otis Gospodnetic commented on LUCENE-1410:
--

For people not intimately familiar with PFOR (like me), I found the following 
to be helpful:
http://cis.poly.edu/cs912/indexcomp.pdf


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-1410b.patch, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636689#action_12636689
 ] 

Michael McCandless commented on LUCENE-1410:


There's also this recent comparison of index compression approaches: 
http://www2008.org/papers/pdf/p387-zhangA.pdf and 
http://homepages.cwi.nl/~heman/downloads/msthesis.pdf.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-1410b.patch, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-03 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636705#action_12636705
 ] 

Paul Elschot commented on LUCENE-1410:
--

Q: We should look at the asm that's produced?
A: Maybe. But would that be java bytecodes or x86 or powerpc? I'd prefer to 
wait with that.
There are some nice machine instrns to this on the various architectures, but 
that would require to use native code.

Q: But can I also get the "bytes consumed" when I ask decompress() to run?
A; Yes, but only after decompression or after compression.
The number of exceptions is not explicitly coded in the header, so the size to 
encode the exceptions is not known beforehand. That could be changed, but than 
the header will get bigger.
(For compression, it's possible to do a run without buffer first.)

Block size 128 should be fine, one could also try 64 and 32.

Thanks for posting the links here, they are the ones I used to code this up. 
(Does that count as homework? :) )
I did not put the links in the code because of possible link rot. The article 
titles and authors are in the javadocs.
Currently the easy way to find the links is via google scholar.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-1410b.patch, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-03 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636715#action_12636715
 ] 

Michael McCandless commented on LUCENE-1410:


{quote}
Q: We should look at the asm that's produced?
A: Maybe. But would that be java bytecodes or x86 or powerpc? I'd prefer to 
wait with that.
There are some nice machine instrns to this on the various architectures, but 
that would require to use native code.
{quote}
I was thinking local CPU's native asm, so that we could at least see if the 
benefits of pfor (no hard-for-cpu-to-predict if statement inside inner loop) 
"survive" through the Java stack.  I'll try to look.

{quote}
Q: But can I also get the "bytes consumed" when I ask decompress() to run?
A; Yes, but only after decompression or after compression.
The number of exceptions is not explicitly coded in the header, so the size to 
encode the exceptions is not known beforehand. That could be changed, but than 
the header will get bigger.
{quote}

OK so it is self-punctuating, so, we wouldn't need to encode block size in 
bytes into the file.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-1410b.patch, TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-03 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636730#action_12636730
 ] 

Paul Elschot commented on LUCENE-1410:
--

The number of bits is not really informative, it would be better to have a 
distribution of the result of getNumFrameBits(). Then we can see for which nrs 
of bits loop unrolling might be tried.

As for decompression speed, please remember that the patching code (that 
decodes the exceptions into the output) has not yet been optimized at all.

The lucene .frq file contains the docids as deltas and the frequencies but with 
a special encoding in case the frequency is one. I'd rather try compressing the 
real delta docids and the real frequencies than the encoded versions.

The .prx file should be useable as it is, and from the results reported in the 
articles I would expect a real performance advantage for PFor for that.
Michael, could you post a distribution of the number of frame bits for the .prx 
file? I'd like to know a reasonable maximum for unrolling the corresponding 
loops.

>: maybe I'm doing something wrong
I don't think so, the code is still young. Try running TestPFor and have a look 
at the output near the end for the case of 6 frame bits. That should show the 
unrolled decoding speed for the case without exceptions. Then compare that to 
the cases with lower and higher nrs of frame bits.

>: Stepping back a bit: I wonder what %tg of the time in a "typical"
Lucene search is spent decoding vInts? That would bound how much
improvement we could ever expect from this optimization during
searching.

A: there is also the influence of the reduction of data to be fetched (via 
various caches) from the index. As reported in the articles, PFor strikes a 
nice optimum between decompression speed and fetching speed from (fast) disks.

>: I was thinking local CPU's native asm.
A: I'd try a C version first. Iirc gcc has a decent optimizer for bit ops, but 
it's been a while for me that I used C.

For the record, to show the variation in decompression speeds for different 
numbers of frame bits without exceptions, here is my current output from 
TestPFor:
{noformat}
test901PerfFor1Decompress starting
Compressed 128 bytes into 8, ratio 0.0625
test901PerfFor1Decompress 0 numFrameBits 1 decompressed 104857600 in 1021 
msecs, 102700 ints/msec, (25 iters).
test901PerfFor1Decompress 1 numFrameBits 1 decompressed 171966464 in 1020 
msecs, 168594 ints/msec, (41 iters).
test901PerfFor1Decompress 2 numFrameBits 1 decompressed 171966464 in 1012 
msecs, 169927 ints/msec, (41 iters).

test902PerfFor2Decompress starting
Compressed 128 bytes into 12, ratio 0.09375
test902PerfFor2Decompress 0 numFrameBits 2 decompressed 130023424 in 1017 
msecs, 127849 ints/msec, (31 iters).
test902PerfFor2Decompress 1 numFrameBits 2 decompressed 155189248 in 1000 
msecs, 155189 ints/msec, (37 iters).
test902PerfFor2Decompress 2 numFrameBits 2 decompressed 159383552 in 1022 
msecs, 155952 ints/msec, (38 iters).

test903PerfFor3Decompress starting
Compressed 128 bytes into 16, ratio 0.125
test903PerfFor3Decompress 0 numFrameBits 3 decompressed 188743680 in 1016 
msecs, 185771 ints/msec, (45 iters).
test903PerfFor3Decompress 1 numFrameBits 3 decompressed 205520896 in 1018 
msecs, 201886 ints/msec, (49 iters).
test903PerfFor3Decompress 2 numFrameBits 3 decompressed 201326592 in 1026 
msecs, 196224 ints/msec, (48 iters).

test904PerfFor4Decompress starting
Compressed 128 bytes into 20, ratio 0.15625
test904PerfFor4Decompress 0 numFrameBits 4 decompressed 146800640 in 1014 
msecs, 144773 ints/msec, (35 iters).
test904PerfFor4Decompress 1 numFrameBits 4 decompressed 159383552 in 1007 
msecs, 158275 ints/msec, (38 iters).
test904PerfFor4Decompress 2 numFrameBits 4 decompressed 159383552 in 1011 
msecs, 157649 ints/msec, (38 iters).

test905PerfFor5Decompress starting
Compressed 128 bytes into 24, ratio 0.1875
test905PerfFor5Decompress 0 numFrameBits 5 decompressed 159383552 in 1010 
msecs, 157805 ints/msec, (38 iters).
test905PerfFor5Decompress 1 numFrameBits 5 decompressed 176160768 in 1009 
msecs, 174589 ints/msec, (42 iters).
test905PerfFor5Decompress 2 numFrameBits 5 decompressed 176160768 in 1004 
msecs, 175458 ints/msec, (42 iters).

test906PerfFor6Decompress starting
Compressed 128 bytes into 28, ratio 0.21875
test906PerfFor6Decompress 0 numFrameBits 6 decompressed 117440512 in 1001 
msecs, 117323 ints/msec, (28 iters).
test906PerfFor6Decompress 1 numFrameBits 6 decompressed 125829120 in 1002 
msecs, 125577 ints/msec, (30 iters).
test906PerfFor6Decompress 2 numFrameBits 6 decompressed 125829120 in 1004 
msecs, 125327 ints/msec, (30 iters).

test907PerfFor7Decompress starting
Compressed 128 bytes into 32, ratio 0.25
test907PerfFor7Decompress 0 numFrameBits 7 decompressed 125829120 in 1016 
msecs, 123847 ints/msec, (30 iters).
test907PerfFor7Decompress 1 numFrameBits 7 decompressed 150994944 

[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-03 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636736#action_12636736
 ] 

Paul Elschot commented on LUCENE-1410:
--

I wrote: The number of bits is not really informative, it would be better to 
have a distribution of the result of getNumFrameBits(). Then we can see for 
which nrs of bits loop unrolling might be tried.

But I misread what was meant by the number of bits, actually the number of 
frame bits that I requested is already listed there, so I'll have a go at 
unrolling for larger numbers of frame bits, especially for the .prx data.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-1410b.patch, TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-04 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636799#action_12636799
 ] 

Michael McCandless commented on LUCENE-1410:


bq. As for decompression speed, please remember that the patching code (that 
decodes the exceptions into the output) has not yet been optimized at all.

But this is what I find so weird: for prx, if I fix the encoding at 6 bits, 
which generates a zillion exceptions, we are ~31% faster than decoding vInts, 
and the pfor file size is much bigger (847 MB vs 621 MB) than vInt.  But if 
instead I use the bit size as returned by getNumFrameBits(), which has far 
fewer exceptions, we are 9.0% slower and file size is a bit smaller than vInt.  
Exception processing seems to be very fast, or, it's the non-unrolled 
(ForDecompress.decodeAnyFrame) that's slow which could very well be.

bq. The lucene .frq file contains the docids as deltas and the frequencies but 
with a special encoding in case the frequency is one. I'd rather try 
compressing the real delta docids and the real frequencies than the encoded 
versions.

I'll try that.  I bet if we had two separate streams (one for the docIDs and 
one for the corresponding freqs) we'd get even better pFor compression.  If we 
did that "for real" in Lucene that'd also make it fast to use a 
normally-indexed field for pure boolean searching (you wouldn't have to index 2 
different fields as you do today to gain the performance at search time).  In 
fact, for AND queries on a normal Lucene index this should also result in 
faster searching since when searching for the common docIDs you at first don't 
need the freq data.

Marvin has been doing some performance testing recently and seems to have 
concluded that keeping prx and frq as separate files (like Lucene does today 
but KinoSearch does not) gives better performance.  Pushing that that same 
trend further, I think it may very well make sense to separate docID and frq as 
well.

bq. A: there is also the influence of the reduction of data to be fetched (via 
various caches) from the index. As reported in the articles, PFor strikes a 
nice optimum between decompression speed and fetching speed from (fast) disks.

True, but we are seeing just a bit of compression vs Lucene's current encoding. 
 I think splitting out frq from docID may show better compression.

{quote}
>: I was thinking local CPU's native asm.
A: I'd try a C version first. Iirc gcc has a decent optimizer for bit ops, but 
it's been a while for me that I used C.
{quote}

Well... that would just make me depressed (if from Javaland the CPU
level benefits of pFor don't really "survive", but from C-land they
do) ;)  But yes I agree.

bq. For the record, to show the variation in decompression speeds for different 
numbers of frame bits without exceptions, here is my current output from 
TestPFor:

I have similar results (up to 7 bits -- can you post your new TestPFor.java?).

The even-byte sizes (16, 24) have very sizable leads over the others. The 9-bit 
size is fantastically slow; it's insane that unrolling it isn't helping.  Seems 
like we will need to peek at asm to understand what's going on at the "bare 
metal" level


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-1410b.patch, TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-04 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636803#action_12636803
 ] 

Paul Elschot commented on LUCENE-1410:
--

I'm working on the 10 right now, but I'm hitting another bug, it will take a 
while.

The performance of TestPFor2 should be better after running TestPFor for all 
bit sizes, this could be a good reason to combine them. 

The correctness tests in TestPFor are not really exhaustive, so I'd like to 
have a simple test for correctness in TestPFor2, for example a running (long) 
sum of all decompressed ints that should equal before and after.

To avoid having to peek at the asm level, there is also the possibility to use 
a slightly higher number of frame bits when we know one number of bits will be 
slow to decompress.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-1410b.patch, TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637119#action_12637119
 ] 

Michael McCandless commented on LUCENE-1410:



(Attached autogen.tgz).

I started peeking at the ASM generated by different tweaks to the
bit decoder methods.

It's rather depressing: small things, like using relative not absolute
getInt from ByteBuffer, declaring things final, reading into array
first instead of getInt per int, make sizable differences in the
resulting asm and decode speed.

To try out these various changes to the Java sources, I created a
Python script to generate the 31 different decode methods.  I then
made a new standalone perf test that reads in a prx file as a series
of packed N-bit ints, and reports the best of 5 runs.

These results are NOT testing the full pfor -- just different possible
methods for doing the N-bit int unpacking part of pfor.  Paul, I
haven't integrated these into the pfor code, but I think we probably
eventually should.

Finally, I switched the autogen to C++/gcc to see how much faster
simple C code is.

In both the java and C tests, by far the fastest way was to mmap the
file and read ints from it, sequentially (relative gets) using
ByteBuffer, so all 3 take that approach.

(To run these, extract autogen.tgz, then open *.py and see the comment
at the top; you'll have to edit the sources to fix the hardwired path
to the prx file).

Java1 code calls getInt() one at a time, like this:

{code}
  static void decode4(final ByteBuffer in, final int[] out) {
final int i1 = in.getInt();
out[0] = i1 & 15;
out[1] = (i1 >>> 4) & 15;
out[2] = (i1 >>> 8) & 15;
out[3] = (i1 >>> 12) & 15;
out[4] = (i1 >>> 16) & 15;
out[5] = (i1 >>> 20) & 15;
out[6] = (i1 >>> 24) & 15;
out[7] = (i1 >>> 28) & 15;
final int i2 = in.getInt();
out[8] = i2 & 15;
out[9] = (i2 >>> 4) & 15;
out[10] = (i2 >>> 8) & 15;
...
  }
{code}

Java 2 code gets all N ints up front, like this:

{code}
  static void decode4(IntBuffer in, int[] inbuf, int[] out) {
in.get(inbuf, 0, 16);
out[0] = inbuf[0] & 15;
out[1] = (inbuf[0] >>> 4) & 15;
out[2] = (inbuf[0] >>> 8) & 15;
out[3] = (inbuf[0] >>> 12) & 15;
out[4] = (inbuf[0] >>> 16) & 15;
out[5] = (inbuf[0] >>> 20) & 15;
out[6] = (inbuf[0] >>> 24) & 15;
out[7] = (inbuf[0] >>> 28) & 15;
out[8] = inbuf[1] & 15;
out[9] = (inbuf[1] >>> 4) & 15;
out[10] = (inbuf[1] >>> 8) & 15;
...
  }
{code}

C++ code is analogous to Java 1 (data is mmap'd):

{code}
static bool decode4(int* out) {
  int i1 = *(data++);
  *(out++) = i1 & 15;
  *(out++) = (i1 >> 4) & 15;
  *(out++) = (i1 >> 8) & 15;
  *(out++) = (i1 >> 12) & 15;
  *(out++) = (i1 >> 16) & 15;
  *(out++) = (i1 >> 20) & 15;
  *(out++) = (i1 >> 24) & 15;
  *(out++) = (i1 >> 28) & 15;
  int i2 = *(data++);
  *(out++) = i2 & 15;
  ...
}
{code}

Here's the performance for each bit size:

||bits||Java 1 (kints/msec)||Java 2 (kints/msec)||C++ (kints/msec)||C 
advantage||
|1|*916.6*|648.5|1445.3|1.6x
|2|*793.8*|593.4|1118.3|1.4x
|3|*616.7*|541.9|1068.8|1.7x
|4|*656.6*|512.1|1161.6|1.8x
|5|*499.8*|469.0|897.3|1.8x
|6|410.6|*444.9*|899.5|2.0x
|7|367.4|*409.0*|801.7|2.0x
|8|*414.0*|386.7|816.6|2.0x
|9|306.3|*366.9*|710.8|1.9x
|10|278.8|*341.9*|665.8|1.9x
|11|258.1|*307.8*|623.6|2.0x
|12|245.9|*311.7*|592.7|1.9x
|13|223.9|*285.0*|574.5|2.0x
|14|204.2|*271.8*|538.1|2.0x
|15|190.3|*260.0*|522.6|2.0x
|16|229.9|*250.4*|519.7|2.1x
|17|190.8|*223.7*|488.3|2.2x
|18|171.6|*198.1*|461.2|2.3x
|19|158.3|*180.5*|436.2|2.4x
|20|163.1|*207.4*|416.4|2.0x
|21|156.3|*166.3*|403.0|2.4x
|22|147.0|*163.5*|387.8|2.4x
|23|141.6|*174.1*|357.5|2.1x
|24|141.9|*162.0*|362.6|2.2x
|25|133.2|*177.6*|335.3|1.9x
|26|125.8|*153.5*|334.7|2.2x
|27|121.6|*139.6*|314.0|2.2x
|28|122.7|*130.1*|316.7|2.4x
|29|107.3|*123.9*|296.7|2.4x
|30|111.0|*127.6*|300.7|2.4x
|31|*108.1*|94.0|290.5|2.7x

C code is between 1.4-2.7 X faster than the best Java run.  Reading
one int at a time is faster when the #bits is small <=5), but then
reading all N ints up front is general faster for larger bit sizes.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-1410b.patch, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional co

[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-06 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637149#action_12637149
 ] 

Paul Elschot commented on LUCENE-1410:
--

I can't see the .tgz attachment at the moment.

Very nice table, this java/c++ comparison. Meanwhile I also have a java program 
generator for decompression, and I got very similar results.

I found a bug for the 17 bits case in decodeAnyFrame(), this might explain the 
sum test failure that you saw.

One detail on the generated code: the final mask can be omitted, for example 
the last bit extraction for the 5 bit case:
.bq (i4 >>> 27)
On the java side leaving this last mask out did not make a difference, but it 
might be noticeable in c++.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-1410b.patch, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637154#action_12637154
 ] 

Michael McCandless commented on LUCENE-1410:


bq. One detail on the generated code: the final mask can be omitted, for 
example the last bit extraction for the 5 bit case:

Ahh good catch.  I'll fix in my autogen.

bq. Meanwhile I also have a java program generator for decompression, and I got 
very similar results.

Excellent!  Did you also switch to relative getInts?  This way, I could change 
my TestPFor2 to rely on self-punctuation when reading.  And, I can pass down a 
ByteBuffer derived directly from the file, instead of copying bytes into byte[] 
first.  This would make the test more fair to pfor.

bq. I found a bug for the 17 bits case in decodeAnyFrame(), this might explain 
the sum test failure that you saw.

OK I can re-test when you post this, to see if the checksums match.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-06 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637163#action_12637163
 ] 

Paul Elschot commented on LUCENE-1410:
--

bq. Did you also switch to relative getInts?

No. I thought about that, but I wanted to make it work correctly first.

bq. This way, I could change my TestPFor2 to rely on self-punctuation when 
reading. And, I can pass down a ByteBuffer derived directly from the file, 
instead of copying bytes into byte[] first.

The only concern I have there is that when using a byte[] directly from the 
file getting the int values may result in non 4 byte aligned fetches. Can 
current hardware do this well?

It's tempting to move to a full C implementation directly now. Should we do 
that?
A similar move was made in the past by letting gcc deal with vInts, but 
meanwhile the jvms caught up there.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637191#action_12637191
 ] 

Michael McCandless commented on LUCENE-1410:



bq. The only concern I have there is that when using a byte[] directly from the 
file getting the int values may result in non 4 byte aligned fetches. Can 
current hardware do this well?

Good question, though if that's the case we could presumably work
around it by ensuring the header of the file is 0 mod 4?

Or... is this because a full block (header + bits + exception data)
may not be 0 mod 4?  (Though we too could pad if necessary)

I think the API we want to reach (eventually) is an IntBlockInput in
Directory where you call read(int[]) and it returns the next 128 (say)
ints in an array and moves itself to the end of that block (start of
the next block).

{quote}
It's tempting to move to a full C implementation directly now. Should we do 
that?
A similar move was made in the past by letting gcc deal with vInts, but 
meanwhile the jvms caught up there.
{quote}

Maybe we should explore this eventually, but I think for now we should
first try to get the Java version online?

I'd really love to get a prototype integration working so that we can
then do an end-to-end performance test (ie, actual searches).  I'm still
wondering how much speedups at this layer will actually affect overall
search time.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-06 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637272#action_12637272
 ] 

Paul Elschot commented on LUCENE-1410:
--

bq. Or... is this because a full block (header + bits + exception data)  may 
not be 0 mod 4? (Though we too could pad if necessary)

The exceptions can be 1, 2 or 4 bytes and there is one more coding possible, we 
might reserve that for long.
I'd like to use some padding space to get the exceptions aligned to their 
natural border, 2 bytes exceptions at even byte position, and 4 bytes 
exceptions at  (... mod 4 == 0) natural int border. That way the rest of the 
padding space (if any) could be used to align to natural int border.

bq. I'd really love to get a prototype integration working so that we can then 
do an end-to-end performance test (ie, actual searches). I'm still wondering 
how much speedups at this layer will actually affect overall search time.

I'm not entirely sure about this, but the (articificially high) decoding speeds 
reported here so far are almost 2000 (!) times higher than the ones reported in 
the articles for decoding speeds for actual query processing. That means there 
is quite a bit of room for disk speeds to catch up with CPU speed. In other 
words, adding this for proximities (and maybe docids and freqs) should make 
lucene ready a natural fit for ssd's.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-07 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637427#action_12637427
 ] 

Michael McCandless commented on LUCENE-1410:


bq. That means there is quite a bit of room for disk speeds to catch up with 
CPU speed.

Exactly!

Search optimization is tricky because for an index that can't fit
entirely in the OS's IO cache, reducing the CPU cost of searching
(which we are diong, here) is basically usless: the end user won't see
much benefit.

For an index entirely in the IO cache, I think these optimizations
might make a big difference.

In some sense, we have been allowed to hide behind the slow
performance of magnetic hard drives and not worry much about reducing
the CPU cost of searching.

However: relatively soon most computers will use SSDs, and then
suddenly it's as if every index is in the IO cache (albeit a somewhat
slower one, but still far faster than magnetic media).  So now is
the time for us to reduce the cpu cost of searching for Lucene.

And this means for this issue and other sources of optimizing search
performance, we should largely focus only on indices entirely cached
in the IO cache.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638543#action_12638543
 ] 

Michael McCandless commented on LUCENE-1410:


Paul, in decompress I added "inputSize = -1" at the top, so that the header is 
re-read.  I need this so I can re-use a single PFor instance during decompress.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> TestPFor2.java, TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-10 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638564#action_12638564
 ] 

Paul Elschot commented on LUCENE-1410:
--

Did you also move to relative addressing in the buffer?

Another question: I suppose the place to add this initially would be in 
IndexOutput and IndexInput?
In that case it would make sense to reserve (some bits of) the first byte in 
the compressed buffer
for the compression method, and use these bits there to call PFor or another 
(de)compression method.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> TestPFor2.java, TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638573#action_12638573
 ] 

Michael McCandless commented on LUCENE-1410:



Another thing that bit me was the bufferByteSize(): if this returns
something that's not 0 mod 4, you must increase it to the next
multiple of 4 otherwise you will lose data since ByteBuffer is big
endian by default.  We should test little endian to see if performance
changes (on different CPUs).

bq. Did you also move to relative addressing in the buffer? 

No I haven't done that, but I think we should.  I believe it's faster.  I'm 
trying now to get a rudimentary test working for TermQuery using pfor.

{quote}
Another question: I suppose the place to add this initially would be in 
IndexOutput and IndexInput?
In that case it would make sense to reserve (some bits of) the first byte in 
the compressed buffer
for the compression method, and use these bits there to call PFor or another 
(de)compression method.
{quote}

This gets into flexible indexing...

Ideally we do this in a pluggable way, so that PFor is just one such
plugin, simple vInts is another, etc.

I could see a compression layer living "above" IndexInput/Output,
since logically how you encode an int block into bytes is independent
from the means of storage.

But: such an abstraction may hurt performance too much since during
read it would entail an extra buffer copy.  So maybe we should just
add methods to IndexInput/Output, or, make a new
IntBlockInput/Output.

Also, some things you now store in the header of each block should
presumably move to the start of the file instead (eg the compression
method), or if we move to a separate "schema" file that can record
which compressor was used per file, we'd put this there.

So I'm not yet exactly sure how we should tie this in "for real"...


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> TestPFor2.java, TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-11 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638744#action_12638744
 ] 

Paul Elschot commented on LUCENE-1410:
--

bq. I'm trying now to get a rudimentary test working for TermQuery using pfor.

It should really make a difference for stop words and disjunction queries 
depending on DocIdSetIterator.next().

Conjunctions that depend on skipTo(docNum) will probably make it necessary to 
impose an upperbound the size of the compressed arrays. This upperbound could 
be the same as the skip distance in the index.

I'm wondering whether it would make sense to add skip info to the term 
positions of very large documents. Any ideas on that?

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> TestPFor2.java, TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-11 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638788#action_12638788
 ] 

Paul Elschot commented on LUCENE-1410:
--

The strange behaviour at 9 bits I reported earlier was due to a mistake in the 
test data. It contained only 31 integers (124 bytes, see posted listing), so 
the unrolled loop would never be called.

And another point to be done:
- change the decompression code generator to use an array get() from the buffer 
for 6 or more frame bits (see the Java 1/2 performance number numbers listed 
above.)

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> TestPFor2.java, TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-12 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1263#action_1263
 ] 

Michael McCandless commented on LUCENE-1410:


bq. It should really make a difference for stop words and disjunction queries 
depending on DocIdSetIterator.next().

Yes.

bq. Conjunctions that depend on skipTo(docNum) will probably make it necessary 
to impose an upperbound the size of the compressed arrays.

Yes.  Though, I think when optimizing search performance we should
focus entirely on the high-latency queries.  TermQuery on very
frequent terms, disjunctions queries involving common terms,
phrase/span queries that have many matches, etc.

EG if PFOR speeds up high-latency queries say by 20% (say 10 sec -> 8
sec), but causes queries that are already fast (say 30 msec) to get a
bit slower (say 40 msec) I think that's fine.  It's the high-latency
queries that kill us because those ones limit how large a collection
you can put on one box before you're forced to shard your index.

At some point we should make use of concurrency when iterating over
large result sets.  EG if estimated # total hits is > X docs, use
multiple threads where each threads skips to it's own "chunk" and
iterates over it, and then merge the results.  Then we should be able
to cut down on the max latency query and handle more documents on a
single machine.  Computers are very quickly become very concurrent.

bq. I'm wondering whether it would make sense to add skip info to the term 
positions of very large documents. Any ideas on that?

Probably we should -- yet another issue :)


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> TestPFor2.java, TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-10-12 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638900#action_12638900
 ] 

Paul Elschot commented on LUCENE-1410:
--

So there's roughly a factor 2 between PFOR and BITS (which is actually just FOR 
i.e. PFOR without patches/exceptions). Meanwhile I've started working on 
speeding up the patching,  combined with always using a multiple of 4 bytes. 
This also involves changing the layout for the exceptions slightly, but it 
should allow faster patching and avoid the byte ordering problems mentioned 
before. It will take another while to finish.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-11-08 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645973#action_12645973
 ] 

Paul Elschot commented on LUCENE-1410:
--

I've been at this quite irregularly. I'm trying to give the PFor class a more 
OO interface and to get exception patching working at more decent speeds. In 
case someone else wants to move this forward faster than it is moving now, 
please holler.

After rereading this, and also after reading up a bit on MonetDb performance 
improvement techniques, I have few more rants:

Taking another look at the decompression performance figures, and especially 
the differences between native C++ and java, it could become worthwhile to also 
implement TermQuery in native code.

With the high decompression speeds of FOR/BITS at lower numbers of frame bits 
it might also become worthwhile to compress character data, for example numbers 
with a low number of different characters.
Adding a dictionary as in PDICT might help compression even further.
This was probably one of the reasons for the column storage discussed earlier, 
I'm now sorry I ignored that discussion.
In the index itself, column storage is also useful. One example is the 
splitting of document numbers and frequency into separate streams, another 
example is various offsets for seeking in the index.

I think it would be worthwhile to add a compressed integer array to the basic 
types used in IndexInput and IndexOutput. I'm still strugling with the addition 
of skip info into a tree of such compressed integer arrays (skip offsets
don't seem to fit naturally into a column, and I don't know whether the skip 
size should be the same as the decompressed array size).
Placement of such compressed arrays in the index should also be aware of CPU 
cache lines and of VM page (disk block) boundaries.
In higher levels of a tree of such compressed arrays, frame exceptions would be 
best avoided to allow direct addressing, but the leafs could use frame 
exceptions for better compression.

For terms that will occur at most once in one document more compression is 
possible, so it might be worthwhile to add these as a key. At the moment I have 
no idea how to enforce the restriction of at most once though.



> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1410) PFOR implementation

2008-12-16 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656988#action_12656988
 ] 

Paul Elschot commented on LUCENE-1410:
--

I'd like to split off the performance test cases from the functional test cases 
for this, but there does not seem to be a nice way to do that using the current 
test methods via ant.

I'm thinking of using a special class name prefix like PerfTest... (or maybe 
something shorter like Perf...) and adding a test target for that in the build 
scripts.

Are there other places where splitting performance tests and functional tests 
could be useful?
When so, what would be a good way to implement it?

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2009-03-23 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688284#action_12688284
 ] 

Eks Dev commented on LUCENE-1410:
-

It looks like Google went there as well (Block encoding), 

see: Blog http://blogs.sun.com/searchguy/entry/google_s_postings_format
http://research.google.com/people/jeff/WSDM09-keynote.pdf (Slides 47-63)



> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2009-03-23 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688409#action_12688409
 ] 

Paul Elschot commented on LUCENE-1410:
--

The encoding in the google research slides is another one.
They use 2 bits prefixing the first byte and indicating the number of bytes 
used for the encoded number (1-4), and then they group 4 of those prefixes 
together to get a single byte of 4 prefixes followed by the non prefixed bytes 
of the 4 encoded numbers.
This requires a 256 way switch (indexed jump) for every 4 encoded numbers, and 
I would expect that jump to limit performance somewhat when compared to pfor 
that has a 32 way switch for 32/64/128 encoded numbers.
But since the prefixes only indicate the numbers of bytes used for the encoded 
numbers, no shifts and masks are needed, only byte moves.
So it could well be wortwhile to give this encoding a try, too, especially for 
lists of numbers shorter than 16 or 32.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2009-05-12 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708590#action_12708590
 ] 

Paul Elschot commented on LUCENE-1410:
--

A very recent paper with some improvements to PFOR:
Yan, Ding, Suel,
Inverted Index Compression and Query Processing with Optimized Document 
Ordering,
WWW 2009, April 20-24 2009, Madrid, Spain

Roughly quoting par. 4.2, Optimizing PForDelta compression:
For an exception, we store its lower b bits instead of the offset to the next 
exception in its corresponding slot, while we store the higher overflow bits 
and the offset in two separate arrays. These two arrays are compressed using 
the Simple16 method.
Also b is chosen to optimize decompression speed. This makes the dependence of 
b on the data quite simple, (in the PFOR above here this dependence is more 
complex) and this improves compression speed.

Btw. the document ordering there is by URL. For many terms this gives more 
shorter delta's between doc ids allowing a higher decompression speed of the 
doc ids.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762742#action_12762742
 ] 

Eks Dev commented on LUCENE-1410:
-

Mike, 
That is definitely the way to go, distribution dependent encoding, where every 
Term gets individual treatment.
  
Take for an example simple, but not all that rare case where Index gets sorted 
on some of the indexed fields (we use it really extensively, e.g. presorted doc 
collection on user_rights/zip/city, all indexed). There you get perfectly 
"compressible"  postings by simply managing intervals of set bits. Updates 
distort this picture, but we rebuild index periodically and all gets good 
again.  At the moment we load them into RAM as Filters in IntervalSets. if that 
would be possible in lucene, we wouldn't bother with Filters (VInt decoding on 
such super dense fields was killing us, even in RAMDirectory) ...  

Thinking about your comments, isn't pulsing somewhat orthogonal to packing 
method? For example, if you load index into RAMDirecectory, one could avoid one 
indirection level and inline all postings.

Flex Indexing rocks, that is going to be the most important addition to lucene 
since it started (imo)... I would even bet on double search speed  in first 
attempt for average queries :)

Cheers, 
eks 

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-01-03 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796107#action_12796107
 ] 

Paul Elschot commented on LUCENE-1410:
--

To encode the exceptions as suggested by Yan,Ding&Suel, Simple9 or Simple16 can 
be used. A Simple9 implementation is at LUCENE-2189.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-01-18 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801813#action_12801813
 ] 

Renaud Delbru commented on LUCENE-1410:
---

Hi,

I have performed some benchmarks using the PFOR index I/O interface in order to 
check if the index reader and block reader were not adding too much overhead (I 
was afraid that the block reading interface was adding too much overhead, and, 
as a consequence, loosing the decompression speed benefits of block based 
compression.

In this benchmark, I have jsut tested FOR (and not PFOR) using various block 
size. The benchmark is setup as follow:
- I generate an integer array of size 33554432, containing uniformly 
distributed integer (0 <= x <  65535)
- I compress the data using PFORDeltaIndexOutput (in fact, aa similar class 
that I modified for this benchmark in order to support various blocksize)
- I measure the time to decompress the full list of integers using 
PFORDeltaIndexInput#next()

I have performed a similar test using a basic IndexInput/Output with VInt 
encoding. The performance was 39Kints/msec.

For FrameOfRef, the best block size seems to be 4096 (256 - 270 kints / msec), 
larger block size do not provide significant performance improvement,

We can observe that FOR provides ~7 times read performance increase. To 
conclude, it looks like the reader and block reader interface do not add to 
much overhead ;o).

P.S.: It is possible that these results are not totally correct. I'll try to 
double check the code, and upload it here. 

Results:
{code}
BlockSize = 32
FrameOfRef 0 decompressed 33554432 in 368 msecs, 91 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 314 msecs, 106 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 294 msecs, 114 kints/msec, (1 iters).
BlockSize = 64
FrameOfRef 0 decompressed 33554432 in 242 msecs, 138 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 239 msecs, 140 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 237 msecs, 141 kints/msec, (1 iters).
BlockSize = 128
FrameOfRef 0 decompressed 33554432 in 223 msecs, 150 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 228 msecs, 147 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 224 msecs, 149 kints/msec, (1 iters).
BlockSize = 256
FrameOfRef 0 decompressed 33554432 in 219 msecs, 153 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 218 msecs, 153 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 219 msecs, 153 kints/msec, (1 iters).
BlockSize = 512
FrameOfRef 0 decompressed 33554432 in 170 msecs, 197 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 176 msecs, 190 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 173 msecs, 193 kints/msec, (1 iters).
BlockSize = 1024
FrameOfRef 0 decompressed 33554432 in 136 msecs, 246 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 139 msecs, 241 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 147 msecs, 228 kints/msec, (1 iters).
BlockSize = 2048
FrameOfRef 0 decompressed 33554432 in 133 msecs, 252 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 135 msecs, 248 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 139 msecs, 241 kints/msec, (1 iters).
BlockSize = 4096
FrameOfRef 0 decompressed 33554432 in 124 msecs, 270 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 131 msecs, 256 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 131 msecs, 256 kints/msec, (1 iters).
BlockSize = 8192
FrameOfRef 0 decompressed 33554432 in 126 msecs, 266 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 128 msecs, 262 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 127 msecs, 264 kints/msec, (1 iters).
BlockSize = 16384
FrameOfRef 0 decompressed 33554432 in 127 msecs, 264 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 125 msecs, 268 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 129 msecs, 260 kints/msec, (1 iters).
BlockSize = 32768
FrameOfRef 0 decompressed 33554432 in 123 msecs, 272 kints/msec, (1 iters).
FrameOfRef 1 decompressed 33554432 in 132 msecs, 254 kints/msec, (1 iters).
FrameOfRef 2 decompressed 33554432 in 135 msecs, 248 kints/msec, (1 iters).
{code}

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 

[jira] Commented: (LUCENE-1410) PFOR implementation

2010-01-18 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801905#action_12801905
 ] 

Paul Elschot commented on LUCENE-1410:
--

Did you see any effect of the JIT? This is possible by making a loop that runs 
for about 1 second, and use that loop 3 times, see the output posting of 3 Oct 
2008. When the 2nd and 3rd time are just about equal one can assume that the 
JIT is done.
Perhaps a shorter time than 1 second can be used nowadays.

For the blocks that you've been measuring one might also see an effect of L1/L2 
data cache size.
This effect is probably not present in my previous postings here because I have 
not yet used this on larger blocks.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-01-18 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801918#action_12801918
 ] 

Renaud Delbru commented on LUCENE-1410:
---

{quote}
Did you see any effect of the JIT? This is possible by making a loop that runs 
for about 1 second, and use that loop 3 times, see the output posting of 3 Oct 
2008. When the 2nd and 3rd time are just about equal one can assume that the 
JIT is done.
{quote}

I haven't seen the effect of the JIT on FOR.
I have re-performed the benchmark, repeating the decompression until the test 
time is superior to 300 ms or 1s (as it is done in 
TestFrameOfRef#doDecompPerfTestUsing() method), but I haven't observed a 
difference with the previous benchmark. 
In fact, it seems that it happens only for block size of 32, in the other 
cases, it seems imperceptible (in the previous benchmark, the number looks 
unstable, in the new benchmark however it looks more stable, but no significant 
increase between the first loop and the other ones).

In the first benchmark, I was not repeating the loop until 1 second is reached 
since there is no easy way to "reset" the reader. In the new benchmark, I am 
recreating the reader in the loop (which causes some overhead). Do you think it 
can have a consequence on the JIT ? 

{quote}
For the blocks that you've been measuring one might also see an effect of L1/L2 
data cache size.
{quote}

Yes, it should be an effect of the cache size. It was to check the increase of 
performance and find the optimal block size. This optimal block size may be 
dependent on my hardware (Intel(R) Core(TM)2 Duo CPU T7300  @ 2.00GHz, cache 
size : 4096 KB).

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-01-18 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801934#action_12801934
 ] 

Paul Elschot commented on LUCENE-1410:
--

{quote}
In the first benchmark, I was not repeating the loop until 1 second is reached 
since there is no easy way to "reset" the reader. In the new benchmark, I am 
recreating the reader in the loop (which causes some overhead). Do you think it 
can have a consequence on the JIT ?
{quote}
I don't know how wheather JIT deals with the code for invidual objects (other 
than classes and their code). There could be an indirect effect via the garbage 
collector. 

The uniform random range of about 65000 generates almost always 14-16 bits per 
number, so these results are roughly comparable to the 14-16 bit cases posted 
on 6 October 2008 by Michael. As it happens they are quite close.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-01-19 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802235#action_12802235
 ] 

Renaud Delbru commented on LUCENE-1410:
---

On another aspect, why is the PFOR/FOR is encoding the number of compressed 
integers into the block header since this information is already stored in the 
stream header (block size information written in 
FixedIntBlockIndexOutput#init()). Is there a particular use case for that ?

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-01-19 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802335#action_12802335
 ] 

Paul Elschot commented on LUCENE-1410:
--

The only reason why the number of compressed integers is encoded in the block 
header here is that when I coded it I did not know that this was not necessary 
in lucene indexes.

That also means that the header can be used for different compression methods, 
for example in the following way:
cases encoded in 1st byte:
32 FrameOfRef cases (#frameBits) followed by 3 bytes for #exceptions (0 for 
BITS, > 0 for PFOR)
16-64 cases for a SimpleNN variant
1-8 cases for run length encoding (for example followed by 3 bytes for length 
and value)
Total #cases is 49-104 or 6-7 bits.

Run length encoding is good for terms that occur in every document and for the 
frequencies of primary keys.

The only concern I have is that the instruction cache might get filled up with 
the code for all these decoding cases.
At the moment I don't know how to deal with that other than by adding such 
cases slowly while doing performance tests all the time.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-01-21 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803272#action_12803272
 ] 

Renaud Delbru commented on LUCENE-1410:
---

{quote}
The only concern I have is that the instruction cache might get filled up with 
the code for all these decoding cases.
At the moment I don't know how to deal with that other than by adding such 
cases slowly while doing performance tests all the time.
{quote}

I am not very familiar with such technologies, and it is new to me to start 
thinking of this kind of problems.
However, from what I understand in the WSDM slides about Google Group Varint 
encoding, they are using a 256-entry table, which is much higher than the 
49-107 cases you're proposing. From what I understand, the instruction cache 
will be filled with such table, so it seems you have still some margin compared 
to Group Varint encoding. Or is there other information that risks to be added 
to the instruction cache ? 

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-01-21 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803376#action_12803376
 ] 

Paul Elschot commented on LUCENE-1410:
--

There is a margin indeed, but it depends on the cache size. For the decoding of 
all 32 FrameOfRef cases, a rough estimation would be 32 cases * (av. 16 indexed 
reads, 32 shifts, 32 masks, 32 indexed writes) = 32 * 112 = 3500 instructions. 
With 3 bytes per instruction on avarage that would be about 10k byte, in a 
normal instruction cache of 16k or 32k byte.
For Simple9 there is a lot less code, (fewer cases, less code per case) but 
when a lot of variations on this are added, perhaps 3k would be needed in total.
So it's not only the number of cases, but also the amount of code per case.
Fortunately not all cases are used frequently, so they won't appear in the 
instruction cache.

This can be compared that to the code needed for VByte, which is just about 
nothing.

During query execution there are other things in the instruction cache, too, 
for example the code of the Scorers used by the query and the code to get the 
data from the index.
Initially the byte code interpreter and the JIT compiler of the JVM are also in 
the instruction cache, but they normally disappear when the JIT is finished. 
But with more code, the JIT will take longer to finish.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-01-23 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804127#action_12804127
 ] 

Paul Elschot commented on LUCENE-1410:
--

Zhang, 2008, see above, reports this:
{quote}
The poor speed of variable- byte on position data is primarily due
to the fact that position values are larger and more often require
2 bytes under variable-byte; this case tends to be much slower due
to a branch mispredict.
{quote}

Taking another look at the position data above (3 October 2008) 11.6% of prx 
values take 7 bits or less,
and the rest fits in 15 bits. So why not encode the position data as VShort (1 
bit as in VByte and 15 bits data) ?
That would enlarge a typical prx file by about 6% and increase position 
decoding speed a lot,
probably about 3 times (see Table 1 in the same paper).

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-01-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804209#action_12804209
 ] 

Michael McCandless commented on LUCENE-1410:


bq. So why not encode the position data as VShort (1 bit as in VByte and 15 
bits data) ?

That sounds interesting... we should give it a shot and see what gains Lucene 
actually sees on real data!

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-01-24 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804227#action_12804227
 ] 

Paul Elschot commented on LUCENE-1410:
--

I opened LUCENE-2232 to use VShort for positions.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-02-15 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833872#action_12833872
 ] 

Renaud Delbru commented on LUCENE-1410:
---

When performing some tests on PFOR, I have noticed that your algorithm was 
favoring very small numFrameBits for a very large number of exceptions. For 
example, on block of size 512, I have noticed that many of the blocks was with 
bestFrameBits = 1, and the number of exceptions reaching > 450.

I found that this was due to the seeting of the allowedNumExceptions variable 
(in the last part of the PFOR#frameBitsForCompression() method) which was set 
to the number of current exceptions + the maximum allowed (which at the end is 
generally extremely large).Is it a bug, or is it something I don't understand 
in the current PFOR algorithm ?

P.S.: btw, the previous benchmark results I have posted are wrong due to a bug 
which was due to the hardcoded byte buffer size (1024) in 
PForIndexInput/Output. I'll post soon updated results, with a comparison with 
GroupVarInt (from WSDM09 - Jeff Dean talk).

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-02-15 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833895#action_12833895
 ] 

Paul Elschot commented on LUCENE-1410:
--

That might indeed be a bug.
In general the frameBitsForCompression() method should try and find the number 
that has the quickest decompression.
The trade off is between exceptions taking extra time, and a smaller number of 
frame bits taking less time.
I have not timed the exception decoding yet, so I would not expect the current 
implementation of frameBitsForCompression() to be right on the spot.

Please note that I'd still like to change the exception encoding, see the 
comment of 12 May 2009:
"For an exception, we store its lower b bits instead of the offset to the next 
exception in its corresponding slot, while we store the higher overflow bits 
and the offset in two separate arrays. These two arrays are compressed using 
the Simple16 method."


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-02-15 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833922#action_12833922
 ] 

Renaud Delbru commented on LUCENE-1410:
---

{quote}
In general the frameBitsForCompression() method should try and find the number 
that has the quickest decompression.
{quote}

In fact,  the method is trying to find the configuration that has the smaller 
size in term of bytes (with the test totalBytes <= bestBytes). But it seems 
that it is not always the best case for quick decompression. I haven't test the 
decompression speed of PFOR yet, but I imagine that with 89% of exceptions 
(while in the original algorithm exception should occurs only a few times), it 
should not be the best performance.

Why not just computing the index (in the sorted copy array) that will represent 
the percentage of values that should be packed, and the remaining of the array 
encoded as exceptions ?

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-02-15 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833952#action_12833952
 ] 

Paul Elschot commented on LUCENE-1410:
--

bq. Why not just computing the index (in the sorted copy array) that will 
represent the percentage of values that should be packed, and the remaining of 
the array encoded as exceptions ?

Because there are cases in which no exceptions are needed, for example 90% of 
the values using the same max nr of bits.

Anyway, the articles on PFor do not completely specify how this should be done, 
and they are based on C, not on Java.
That means there will be room for improvement.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-02-23 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837231#action_12837231
 ] 

Renaud Delbru commented on LUCENE-1410:
---

I am reporting here some experiments I performed over the past weeks while I 
was trying to improve the FOR implementation. 
I re-implement the FOR algorithms by getting rid of the IntBuffer and working 
directly with the byte array. I have implemented multiple variants, such as one 
that directly performs transformation over bytes to create the uncompressed 
data (what I call byte-level in the next), and one that first convert bytes 
into integers, and then performs transformation on integers to create the 
uncompress data (what I call integer-level in the next). The last one is very 
similar to your original FOR implementation, but without the IntBuffer.

I think these results can be of interest for you, especially to optimise 
certain cases (byte level manipulation for certain cases such as bit frame of 
2, 4 or 8 seems more suitable). I have attached a file containing a summary of 
the results for space consideration. I can provide you the raw results, and the 
code if you would like to test it on your side.
However, I get very different results if I perform the benchmarks on a 64 bits 
OS or 32 Bits OS (on a same computer, IBM T61, the only difference is that on 
one computer Ubuntu 9.10 32 bits is installed, on the other one it is Ubuntu 
9.10 64 bits).

I am a little bit puzzled by these results. I thought that removing the 
IntBuffer and working directly with the byte array will be faster (as I noticed 
in other compression algorithms, such as GroupVarInt). The IntBuffer you are 
currently using is a view on a byte buffer. It therefore does the conversion 
between byte to int, plus it does several checks (if the index is in the range 
of the buffer) and function calls.
But it seems that with FOR this does not make too much a difference on large 
integers (> 8 bits). Moreover, I observe a decrease of performance on 64 bits 
OS.
Maybe, you have an idea about the difference in behavior.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, LUCENE-1410-codecs.tar.bz2, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-02-28 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839509#action_12839509
 ] 

Paul Elschot commented on LUCENE-1410:
--

bq. I thought that removing the IntBuffer and working directly with the byte 
array will be faster ...

When the int values are in processor byte order, a call to  IntBuffer.get() may 
be reduced by the JIT to a single hardware instruction. This is why the initial 
implementation uses IntBuffer.
Also, the index bound checks need only be done once for the first and last 
index used.

I have no idea why a 64 bit OS would be slower than a 32 bit OS.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-04-02 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852855#action_12852855
 ] 

Renaud Delbru commented on LUCENE-1410:
---

Here some results on the perofrmance of compression-decompression of various 
algorithms over the wikipedia dataset (english articles).

First, the performance on compressing/decompressing over the full .doc posting 
list using block size of 1024. 
The Compression and Decompression are in KInt/ms and the size in bytes.

||Codec||Compression||Decompression||Size||
|FOR (Orig)|20|106|448856|
|PFOR (Orig)|8|107| 383596|
|VINT|37|74|421764|
|FOR (Opt)|39|102|447369|
|PFOR (Opt)|10|108|382379|
|RICE|8|31|350687|
|S9|16|65|408218|

* VInt is a block based version of variable integer (unlike the simple int 
block, I am creating blocks using vint, then read the full block in memory and 
decompress it using vint one integer at a time).
* For (Orig) and PFOR (Orig) are your implementation of FOR and PFOR.
* FOR (Opt) and PFOR (Opt) are my implementation of FOR and PFOR (using array 
of functors for each compression decompression case).
* Rice is one implementation of the rice algorithm, but block-based.
* S9 is your implementation of simple 9 with some bug fixes, but block-based.

I have created a *IndexInput and *IndexOutput for each algorithm, and use them 
to compress and decompress the posting file in this experiment.

We can see that using an array of functors for compression in FOR provides a 
good speed increase in compression. However, on PFOR, the compression speed 
increase is limited. From my test, it is due to the sort step which is 
necessary for comrpessing each block. 
PFOR provides a good balance between decompression speed and size, but it is 
very slow in compression (it is as slow as Rice). I don't think it is possible 
to improve the compression speed due to the sort step, which seems to impose a 
hard upper limit in term of performance (I have tried various heuristics to 
avoid sorting, but not very successful so far).

Following this first benchmark, I have created various Lucene codec that uses 
the *IndexInput and *IndexOutput of the different compression algorithms. I 
have indexed the full wikipedia dataset. Each block-based codec is configured 
to use block of 1024. 
I have recorded the average commit time and the optimise time. Then, I have 
executing random keyword queries and I have recorded the average query time and 
the number of bytes read for answering the query.

h5. Compression

Time is in ms, Size in bytes.

||Codec||Avg Commit Time||Total Commit Time||Optimise Time||Index Size||
|STD|1276|359978|113386|1423251|
|SEP|1437|405286|117163|1666870|
|VINT|1572|426551|122557|1742083|
|RICE|1791|505312|230636|1339703|
|S9|1575|444157|146216|1530666|
|FOR|1520|428847|134502|1754578|
|PFOR|1852|522401|189262|1511154|

* STD is the Standard lucene codec.
* SEP the Sep codec.
* All the other algorithms are based on the Sep codec, but block-based.
* FOR and PFOR is my implementation.

We can see that the sep codec and block-based codecs have a certain overhead 
(in indexing time and index size) compared to the standard lucene codec. But, I 
imagine that this overhead can be reduced with some code optimisation. For the 
index size, the overhead in block-based codeca is due to the skip list, which 
is much bigger (127MB for SEP against 189MB for block-based) since we need to 
encode the block offset, and the inner documet offset in the block.

In term of compression speed and index size, we can see that this results 
follows the previous results. We can observe also that PFOR is the slowest.

h5. Decompression

For each codec, I have performed five runs, each run having 200 random keyword 
queries, and average the time over the 5 runs and 200 queries (i.e., 200*5 = 
1000 query execution).
For each run, I am opening a new IndexReader and IndexSearcher and I am reusing 
them across the 200 random queries.

To generate the queries, I first grouped the terms into three group: HIGH, 
MEDIUM and LOW, based on their frequency, in order to be able to generate 
random keyword queries based on this three frequency groups. Then, I have 
performed benchmarks with random queries of one, two or more keywords, using 
all possible combinations (e.g., HIGH & HIGH, HIGH & MEDIUM, etc). I have 
executed the queries using a single thread, and also using multiple threads.

I am reporting below a few of query results, but for most of the other queries, 
the results were following the same scheme. I don't report the results of the 
Standard and Sep codec, but they always outperform block-based codec, whatever 
compression used.

However, the results for the query time are completely contradictory with the 
results of the first benchmark. The slowest algorithm (Rice, Vint) generally 
provides the faster query execution time, and the faster algorithm (FOR, PFOR) 
pr

[jira] Commented: (LUCENE-1410) PFOR implementation

2010-04-02 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852868#action_12852868
 ] 

Michael McCandless commented on LUCENE-1410:


Thanks for sharing these thorough results Renaud!

Curious that PFOR/FOR don't do well during searching...  have you tried 
profiling?  Maybe something silly is going one.

One issue is MUST mixed with other clauses -- the scoring for such a query will 
do alot of seeking, which for block based codecs will be costly.  But it's 
still strange you don't see speedups for single term query.  Have you tried 
only SHOULD clauses?

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-04-02 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852885#action_12852885
 ] 

Renaud Delbru commented on LUCENE-1410:
---

{quote}
Curious that PFOR/FOR don't do well during searching... have you tried 
profiling? Maybe something silly is going one.
{quote}

I will try profiling. But I am surprised, because I am using the same 
*IndexInput and *IndexOutut than for the first benchmark. So, if there is a 
problem, it should be "outside" the indexinput.
But, I'll double check.

{quote}
One issue is MUST mixed with other clauses - the scoring for such a query will 
do alot of seeking, which for block based codecs will be costly. But it's still 
strange you don't see speedups for single term query. Have you tried only 
SHOULD clauses?
{quote}

Here is the results with only SHOULD clause.

h6. HIGH:SHOULD HIGH:SHOULD HIGH:SHOULD HIGH:SHOULD - 1 thread - 200 random 
queries

||Codec||Avg Query Time||
|Rice|6 ms|
|VInt|5 ms|
|PFOR|6.5 ms|
|FOR|6.8 ms|
|S9|4.7 ms|

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-04-02 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852924#action_12852924
 ] 

Paul Elschot commented on LUCENE-1410:
--

I think the mixed performance results for decompression and query times may be 
caused by the use of only a single method. For very short sequences (1 to 2 or 
3 integers), I would expect VInt (actually VByte) to perform best. For long 
sequences (from about 25 integers) , (P)FOR should do best. In between the two, 
(a variant of) S9.
The problem will be to find the optimal bordering sequence sizes to change the 
compression method.

The fact that S9 is already doing better than VInt is encouraging. Since (P)FOR 
can do even better than S9, when using (P)FOR only for longer sequences, I'd 
expect a real performance boost for queries using frequently occurring terms in 
the index.

Also, I'd recommend to verify query results for each method. S9 as I 
implemented it is only tested by its own test cases.
When the query results are incorrect, measuring performance is not really 
useful, and this has happened already for the PFOR implementation here, see 
above in early October 2008.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-04-02 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852930#action_12852930
 ] 

Renaud Delbru commented on LUCENE-1410:
---

{quote}
The fact that S9 is already doing better than VInt is encouraging. Since (P)FOR 
can do even better than S9, when using (P)FOR only for longer sequences, I'd 
expect a real performance boost for queries using frequently occurring terms in 
the index.
{quote}

So, you're suggesting that maybe in my benchmark, the decoded sequence are not 
long enough to see the performance improvements of FOR/PFOR ? 

Also, in my benchmark, VINT means block-based VInt. So, there is the same 
overhead, i.e., decompress the full block even if a partial sequence is needed, 
than for FOR, PFOR, S9 and the others. But, even with these settings, as you 
see, the average query time is smaller when using VInt than FOR/PFOR. 

{quote}
Also, I'd recommend to verify query results for each method. S9 as I 
implemented it is only tested by its own test cases.
{quote}

We implemented more unit tests for each codec. Also, in the benchmark, I output 
the number of hits for each query per codec, and the number of hits for each 
codec is the same (but I haven't checked if they returns all the same document 
ids).

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-04-02 Thread Renaud Delbru (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852933#action_12852933
 ] 

Renaud Delbru commented on LUCENE-1410:
---

Just an update,

I just executed the same benchmark, but instead of executing only 200 queries 
per run, I have executed 1000 queries, and it seems that the results of FOR and 
PFOR are better. Maybe, this is again an effect of the JIT. 200 queries per run 
were not enough to see the effect of the JIT, and therefore FOR and PFOR were 
penalized.

I'll perform again the full benchmarks with a larger number of random queries, 
and report to you the results (hopefully good results).


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread Paul Elschot
Eks,

> 
> [ 
> https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762742#action_12762742
>  ] 
> 
> Eks Dev commented on LUCENE-1410:
> -
> 
> Mike, 
> That is definitely the way to go, distribution dependent encoding, where 
> every Term gets individual treatment.
>   
> Take for an example simple, but not all that rare case where Index gets 
> sorted on some of the indexed fields (we use it really extensively, e.g. 
> presorted doc collection on user_rights/zip/city, all indexed). There you get 
> perfectly "compressible"  postings by simply managing intervals of set bits. 
> Updates distort this picture, but we rebuild index periodically and all gets 
> good again.  At the moment we load them into RAM as Filters in IntervalSets. 
> if that would be possible in lucene, we wouldn't bother with Filters (VInt 
> decoding on such super dense fields was killing us, even in RAMDirectory) ... 
>  

You could try switching the Filter to OpenBitSet when that takes fewer bytes 
than SortedVIntList.

Regards,
Paul Elschot


Re: [jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread eks dev
Paul,
the point I was trying to make with this example was extreme,  but realistic. 
Imagine 100Mio docs, sorted on field user_rights,  a term user_rights:XX 
selects 40Mio of them (user rights...). To encode this, you need format with  
two integers (for more of such intervals you would need slightly more, but 
nevertheless, much less than for OpenBitSet, VInts, PFor...  ). Strictly 
speaking this term is dense, but highly compressible and could be inlined with 
pulsing trick...

cheers, eks  




>
>From: Paul Elschot 
>To: java-dev@lucene.apache.org
>Sent: Tuesday, 6 October, 2009 23:33:03
>Subject: Re: [jira] Commented: (LUCENE-1410) PFOR implementation
>
>Eks,
>
>
>> 
>>> [ 
>>> https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762742#action_12762742
>>>  ] 
>>> 
>>> Eks Dev commented on LUCENE-1410:
>>> -
>>> 
>>> Mike, 
>>> That is definitely the way to go, distribution dependent encoding, where 
>>> every Term gets individual treatment.
>>> 
>>> Take for an example simple, but not all that rare case where Index gets 
>>> sorted on some of the indexed fields (we use it really extensively, e.g. 
>>> presorted doc collection on user_rights/zip/city, all indexed). There you 
>>> get perfectly "compressible"  postings by simply managing intervals of set 
>>> bits. Updates distort this picture, but we rebuild index periodically and 
>>> all gets good again.  At the moment we load them into RAM as Filters in 
>>> IntervalSets. if that would be possible in lucene, we wouldn't bother with 
>>> Filters (VInt decoding on such super dense fields was killing us, even in 
>>> RAMDirectory) ... 
>
>
>You could try switching the Filter to OpenBitSet when that takes fewer bytes 
>than SortedVIntList.
>
>
>Regards,
>>Paul Elschot
>
>
>


  

Re: [jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread eks dev
if you would drive this example further in combination with flex-indexing 
permitting per term postings format, I could imagine some nice tools for 
optimizeHard() , where normal index construction works with defaults as planned 
for solid mix-performance case and at the end you run optimizeHard() where 
postings get resorted on such fields (basically enabling rle encoding to work) 
and at the same time all other terms get optimal encoding format for 
postings... perfect for read only indexes where you want to max performance and 
reduce ix size


>
>From: eks dev 
>To: java-dev@lucene.apache.org
>Sent: Tuesday, 6 October, 2009 23:59:12
>Subject: Re: [jira] Commented: (LUCENE-1410) PFOR implementation
>
>
>Paul,
>the point I was trying to make with this example was extreme,  but realistic. 
>Imagine 100Mio docs, sorted on field user_rights,  a term user_rights:XX 
>selects 40Mio of them (user rights...). To encode this, you need format with  
>two integers (for more of such intervals you would need slightly more, but 
>nevertheless, much less than for OpenBitSet, VInts, PFor...  ). Strictly 
>speaking this term is dense, but highly compressible and could be inlined with 
>pulsing trick...
>
>cheers, eks  
>
>
>
>
>>
>>From: Paul Elschot 
>>To: java-dev@lucene.apache.org
>>Sent: Tuesday, 6 October, 2009 23:33:03
>>Subject: Re: [jira] Commented: (LUCENE-1410) PFOR implementation
>>
>>Eks,
>>
>>
>>> 
>>>>> [ 
>>>>> https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762742#action_12762742
>>>>>  ] 
>>>>> 
>>>>> Eks Dev commented on LUCENE-1410:
>>>>> -
>>>>> 
>>>>> Mike, 
>>>>> That is definitely the way to go, distribution dependent encoding, where 
>>>>> every Term gets individual treatment.
>>>>> 
>>>>> Take for an example simple, but not all that rare case where Index gets 
>>>>> sorted on some of the indexed fields (we use it really extensively, e.g. 
>>>>> presorted doc collection on user_rights/zip/city, all indexed). There you 
>>>>> get perfectly "compressible"  postings by simply managing intervals of 
>>>>> set bits. Updates distort this picture, but we rebuild index periodically 
>>>>> and all gets good again.  At the moment we load them into RAM as Filters 
>>>>> in IntervalSets. if that would be possible in lucene, we wouldn't bother 
>>>>> with Filters (VInt decoding on such super dense fields was killing us, 
>>>>> even in RAMDirectory) ... 
>>
>>
>>You could try switching the Filter to OpenBitSet when that takes fewer bytes 
>>than SortedVIntList.
>>
>>
>>Regards,
>>>>Paul Elschot
>>
>>
>>
>


  

Re: [jira] Commented: (LUCENE-1410) PFOR implementation

2009-10-06 Thread Paul Elschot
On Tuesday 06 October 2009 23:59:12 eks dev wrote:
> Paul,
> the point I was trying to make with this example was extreme,  but realistic. 
> Imagine 100Mio docs, sorted on field user_rights,  a term user_rights:XX 
> selects 40Mio of them (user rights...). To encode this, you need format with  
> two integers (for more of such intervals you would need slightly more, but 
> nevertheless, much less than for OpenBitSet, VInts, PFor...  ). Strictly 
> speaking this term is dense, but highly compressible and could be inlined 
> with pulsing trick...

Well, I've been considering to add compressed consecutive ranges to 
SortedVIntList, but I did not
get further than considering. This sounds like the perfect use case for that.

Regards,
Paul Elschot


> 
> cheers, eks  
> 
> 
> 
> 
> >
> >From: Paul Elschot 
> >To: java-dev@lucene.apache.org
> >Sent: Tuesday, 6 October, 2009 23:33:03
> >Subject: Re: [jira] Commented: (LUCENE-1410) PFOR implementation
> >
> >Eks,
> >
> >
> >> 
> >>> [ 
> >>> https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762742#action_12762742
> >>>  ] 
> >>> 
> >>> Eks Dev commented on LUCENE-1410:
> >>> -
> >>> 
> >>> Mike, 
> >>> That is definitely the way to go, distribution dependent encoding, where 
> >>> every Term gets individual treatment.
> >>> 
> >>> Take for an example simple, but not all that rare case where Index gets 
> >>> sorted on some of the indexed fields (we use it really extensively, e.g. 
> >>> presorted doc collection on user_rights/zip/city, all indexed). There you 
> >>> get perfectly "compressible"  postings by simply managing intervals of 
> >>> set bits. Updates distort this picture, but we rebuild index periodically 
> >>> and all gets good again.  At the moment we load them into RAM as Filters 
> >>> in IntervalSets. if that would be possible in lucene, we wouldn't bother 
> >>> with Filters (VInt decoding on such super dense fields was killing us, 
> >>> even in RAMDirectory) ... 
> >
> >
> >You could try switching the Filter to OpenBitSet when that takes fewer bytes 
> >than SortedVIntList.
> >
> >
> >Regards,
> >>Paul Elschot
> >
> >
> >
> 
> 
>