[jira] [Commented] (LUCENE-1410) PFOR implementation

2012-03-25 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237893#comment-13237893
 ] 

Michael McCandless commented on LUCENE-1410:


bq. Out of curiousity, is the PFOR effort dead? 

Nothing in open source is ever dead!  (Well, rarely...).  It's just that nobody 
has picked this up again and pushed it to a committable state.

I think now that we have no more bulk API in trunk, it may not be that much 
work to finish... though there could easily be surprises.

I opened LUCENE-3892 to do exactly this, as a Google Summer of Code project.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, 
> LUCENE-1410c.patch, LUCENE-1410d.patch, LUCENE-1410e.patch, 
> TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, TestPFor2.java, 
> autogen.tgz, for-summary.txt
>
>   Original Estimate: 21,840h
>  Remaining Estimate: 21,840h
>
> Implementation of Patched Frame of Reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1410) PFOR implementation

2012-03-24 Thread The Alchemist (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237604#comment-13237604
 ] 

The Alchemist commented on LUCENE-1410:
---

Out of curiousity, is the PFOR effort dead?  I was thinking about running some 
newer benchmarks using Java 7, and see if that makes a difference.  

Do you guys think that's worthwhile?

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, 
> LUCENE-1410c.patch, LUCENE-1410d.patch, LUCENE-1410e.patch, 
> TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, TestPFor2.java, 
> autogen.tgz, for-summary.txt
>
>   Original Estimate: 21,840h
>  Remaining Estimate: 21,840h
>
> Implementation of Patched Frame of Reference.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-24 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974901#action_12974901
 ] 

Paul Elschot commented on LUCENE-1410:
--

bq. .. we might be able to have some gains by allowing a directory to return an 
IntBufferIndexInput of some sort (separate from DataInput/IndexInput) that 
basically just positions an IntBuffer view (the default implementation would 
fill from an indexinput into a bytebuffer like we do now),

Since things are moving on the nio buffer front (see also LUCENE-2292), how 
about trying to be independent from the buffer implementation?
That might be done by allowing an IntBuffer wrapping a byte array or as view 
alongside a ByteBuffer, or temporary IntBuffer as above.

To be independent from the buffer implementation we could add some methods to 
IndexInput:

void startAlignToInt() // basically a seek to the next multiple of 4 byte when 
not already there. Could also start using an IntBuffer somehow.
int readAlignedInt() // get the next int, default to readInt(), use an 
IntBuffer when available.
void endAlignToInt() // switch back to byte reading, set the byte buffer to the 
the position corresponding to the int buffer.

(Adding this to DataInput seems to be a more natural place, but DataInput 
cannot seek.)

Would that work, and could it work fast?

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-23 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974765#action_12974765
 ] 

Paul Elschot commented on LUCENE-1410:
--

I tried to revive the tests from the 1410e patch, but it does not make much 
sense because they test rather short sequences of input to be compressed, and 
decompressor is now hardcoded to always decompress 128 values.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974315#action_12974315
 ] 

Robert Muir commented on LUCENE-1410:
-

{quote}
Did you also test without a copy (without the readbytes() call) into the 
underlying byte array for the IntBuffer? That might be even faster,
and it could be possible when using for example a BufferedIndexInput or an 
MMapDirectory.
For decent buffer.get() speed the starting byte would need to be aligned at an 
int border.
{quote}

Yes, for the mmap case I tried the original dangerous hack, exposing in 
Intbuffer view of its internal mapped byte buffer.
I also tried mmapindexinput keeping track of its own intbuffer view.

we might be able to have some gains by allowing a directory to return an 
IntBufferIndexInput of some sort (separate from DataInput/IndexInput)
that basically just positions an IntBuffer view (the default implementation 
would fill from an indexinput into a bytebuffer like we do now),
but I haven't tested this across all the directories yet... it might help NIOFS 
though as it would bypass the double-buffering of BufferedIndexInput.
For SimpleFS it would be the same, and for MMap i'm not very hopeful it would 
be better, but maybe not worse.

if that worked maybe we could do the same with Long, for things like simple-8b 
(http://onlinelibrary.wiley.com/doi/10.1002/spe.948/abstract)


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-22 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974307#action_12974307
 ] 

Paul Elschot commented on LUCENE-1410:
--

bq. I've tested everything I can think of and it seems this nio 
ByteBuffer/IntBuffer approach is always the fastest ...

Did you also test without a copy (without the readbytes() call) into the 
underlying byte array for the IntBuffer? That might be even faster,
and it could be possible when using for example a BufferedIndexInput or an 
MMapDirectory.
For decent buffer.get() speed the starting byte would need to be aligned at an 
int border.



> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-22 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974303#action_12974303
 ] 

Paul Elschot commented on LUCENE-1410:
--

bq. ... is it possible to encode # of exception bytes in header?

In the first implementation the start index of the exception chain is in the 
header (5 or 6 bits iirc). In the second implementation (by Hoa Yan) there is 
no exception chain, so the number of exceptions must somehow be encoded in the 
header.
That means encoding the # exception bytes in the header would be easier in the 
second implementation, but it is also possible in the first one.

I would expect that a few bits for the number of encoded integers would also be 
added in the header (think 32, 64, 128...).
The number of frame bits takes 5 bits.
That means that there are about 2 bytes unused in the header now, and I'd 
expect 1 byte to be enough to encode the number of bytes for the exceptions. 
For example a bad case in the first implementation of 10 exceptions of 4 bytes 
means 40 bytes data, that fits in 6 bits, the same
bad case in the second implementation would also need to store the indexes of 
the exceptions in 10*5 bits, totalling 90 bytes that can be
encoded in 7 bits. However, I don't know what the worst case # exceptions is. 
(This gets into vsencoding...)

For the moment I'll just leave this unchanged and get the tests working on the 
current first implementation.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973927#action_12973927
 ] 

Robert Muir commented on LUCENE-1410:
-

Sorry, correction (i meant the length in bytes or ints *compressed*, to tell us 
how many bytes to read)

In the FOR case we now do:

{noformat}
int header = in.readInt();
final int numFrameBits = ((header >>> 8) & 31) + 1;
in.readBytes(input, 0, numFrameBits << 4);
{noformat}

But in PFOR we still have "two headers"

{noformat}
int numBytes = in.readInt(); // nocommit: is it possible to encode # of 
exception bytes in header?
in.readBytes(input, 0, numBytes);
compressedBuffer.rewind();
int header = compressedBuffer.get();
{noformat}


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973914#action_12973914
 ] 

Robert Muir commented on LUCENE-1410:
-

{quote}
I'm running into a nocommit for the nio byte buffer allocation in 
ForDecompress.java.
Shall I try and move the buffer handling from there into FORIndexInput and 
PForDeltaIndexInput at the codecs?
{quote}

I am to blame for this I think! Actually I think the buffer handling could stay 
and we could just remove the nocommit?
I've tested everything I can think of and it seems this nio 
ByteBuffer/IntBuffer approach is always the fastest:
its only slower to do it other ways, and it doesnt help to do trickier things 
like IntBuffer views of MMap even.

One thing that would be good, is it possible to encode the length in 
decompressed bytes (or the length in bytes of exceptions)
into PFOR's int header? this would allow us to remove the wasted per-block int 
that we currently encode now.

Then we could "put FOR and PFOR back together" again... sorry i split apart the 
decompressors to remove the wasted int
in the FOR case since we can get it from its header already.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-21 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973897#action_12973897
 ] 

Paul Elschot commented on LUCENE-1410:
--

I'm running into a nocommit for the nio byte buffer allocation in 
ForDecompress.java.
Shall I try and move the buffer handling from there into FORIndexInput and 
PForDeltaIndexInput at the codecs?
I could leave it as it is, but then the test cases from the 1410e patch would 
have to be adapted again when the nocommit is fixed.

Also the package/directory naming o.a.l.util.pfor and 
o.a.l.index.codecs.pfordelta may be confusing.
Probably pfordelta could could be renamed to pfor, since delta refers to 
differences (in docids and positions) that are treated elsewhere.
But I'd rather not change that now. 

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973830#action_12973830
 ] 

Michael McCandless commented on LUCENE-1410:


Yes positions are bulk coded too, but we haven't cutover any positional queries 
yet to use the bulk enum API... we should cutover at least one (I think exact 
PhraseQuery is probably easiest!).

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-21 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973827#action_12973827
 ] 

Paul Elschot commented on LUCENE-1410:
--

I had a quick look at the codecs for this, but I couldn't find the answer to 
this question easily:
Are the positions here also encoded by the bulk int encoders (VInt and FOR)?

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973790#action_12973790
 ] 

Robert Muir commented on LUCENE-1410:
-

On LUCENE-2723, I uploaded a "bulk vint" codec that shares most of the same 
codepath as FOR/PFOR,
except it writes blocks of 128 vint-encoded integers.

There are performance numbers there compared to our Standard vint-based codec, 
as you can see
it differs dramatically due to other reasons.

So I thought it would be useful to then compare FOR to this, since its a good 
measure of just the compression
algorithm, but everything else is the same (comparing two 128-block size 
FixedIntBlock codecs with the same 
index layout, etc etc). This way we compare apples to apples.
 
||Query||QPS BulkVInt||QPS FOR||Pct diff
|united~1.0|9.43|9.39|{color:red}-0.5%{color}|
|united~2.0|2.02|2.02|{color:red}-0.3%{color}|
|unit~1.0|6.37|6.36|{color:red}-0.1%{color}|
|unit~2.0|6.13|6.21|{color:green}1.2%{color}|
|"unit state"~3|3.45|3.51|{color:green}2.0%{color}|
|spanNear([unit, state], 10, true)|2.89|2.99|{color:green}3.3%{color}|
|unit*|30.04|31.42|{color:green}4.6%{color}|
|unit state|8.00|8.40|{color:green}5.0%{color}|
|"unit state"|5.97|6.37|{color:green}6.7%{color}|
|spanFirst(unit, 5)|11.29|12.10|{color:green}7.2%{color}|
|uni*|17.36|18.69|{color:green}7.6%{color}|
|+unit +state|10.99|12.18|{color:green}10.8%{color}|
|+nebraska +state|65.74|73.06|{color:green}11.1%{color}|
|state|28.90|32.37|{color:green}12.0%{color}|
|u*d|10.54|12.45|{color:green}18.1%{color}|
|un*d|40.06|47.61|{color:green}18.9%{color}|


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-21 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973733#action_12973733
 ] 

Paul Elschot commented on LUCENE-1410:
--

No need to be sorry. Thanks for taking this on.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973593#action_12973593
 ] 

Michael McCandless commented on LUCENE-1410:


bq. In the 1410e patch here are test cases that have not made into the 
bulkpostings branch. I'll try and revive these first.

Ugh, sorry :(  Thanks!

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-20 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973286#action_12973286
 ] 

Paul Elschot commented on LUCENE-1410:
--

In the 1410e patch here are test cases that have not made into the bulkpostings 
branch. I'll try and revive these first.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973161#action_12973161
 ] 

Michael McCandless commented on LUCENE-1410:


OK I committed "pfor2" onto the branch.

I also add a new low-level "encode random ints" tests. pfor1 passes the test 
but pfor2 fails it (I'm guessing this is the 2^28 limitation of Simple16, but 
I'm really not sure), so I left the pfor2 random ints test @Ignore for now...

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973159#action_12973159
 ] 

Michael McCandless commented on LUCENE-1410:


bq. A simple solution is to treat the high bit of the 28 bit value just the 
same as in vByte, and allow a vByte to follow in the 28 bit case. The high bit 
can also be added to the selector easily to avoid testing for it.

That sounds great!  Any chance you could fix this up on one of the Simple16 
impls?  I'd really like to have a Simple9/16 codec to better test our variable 
int block codec infrastructure...

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-19 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973021#action_12973021
 ] 

Paul Elschot commented on LUCENE-1410:
--

bq. ... separately encode exc positions & values (high bits), leaving low bits 
in the slot. Does this give better perf than linking them together? (I think 
pfor1 links).

This saves forced exceptions for low numbers of frame bits, so the treatment of 
exceptions is cleaner.

bq. Simple16 cannot represent values >= 2^28?

A simple solution is to treat the high bit of the 28 bit value just the same as 
in vByte, and allow a vByte to follow in the 28 bit case. The high bit can also 
be added to the selector easily to avoid testing for it.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972991#action_12972991
 ] 

Michael McCandless commented on LUCENE-1410:


bq. Seems like we need to solve this with simple9/simple16 too?

Yes!

{quote}
Like a random test that encodes/decodes a ton of integers (including things 
that would be rare deltas)
via the codec API?
{quote}

I completely agree: we need heavy low-level tests for the int encoders... I'll 
stick a nocommit in when I commit!

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972990#action_12972990
 ] 

Robert Muir commented on LUCENE-1410:
-

bq. How come bits cannot go up to 31? Or maybe you just use a full int if it's 
over 28? Seems like a good idea...

Seems like we need to solve this with simple9/simple16 too?

bq. Although all tests pass if I run w/ -Dtests.codec=PatchedFrameOfRef2,
if I try to build a big wikipedia index I hit this:

Mike, I've encountered this problem myself while messing with for/pfor.
I know for these things we need low-level unit tests, but can we cheat in some 
way?

Like a random test that encodes/decodes a ton of integers (including things 
that would be rare deltas)
via the codec API?

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972988#action_12972988
 ] 

Michael McCandless commented on LUCENE-1410:


Also VSEncoding
(http://puma.isti.cnr.it/publichtml/section_cnr_isti/cnr_isti_2010-TR-016.html)
looks very interesting -- faster that PForDelta!


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, 
> LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, 
> TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-12-15 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971616#action_12971616
 ] 

Michael McCandless commented on LUCENE-1410:


OK I committed the prototype impl onto the bulk postings branch.

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: Bulk Postings branch
>
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-08-02 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894827#action_12894827
 ] 

Michael Busch commented on LUCENE-1410:
---

Nice blog post, Mike!

> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, 
> LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, 
> LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, 
> TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1410) PFOR implementation

2010-07-29 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893831#action_12893831
 ] 

Paul Elschot commented on LUCENE-1410:
--

I'm sorry that there is no code yet for a better patching implementation, see 
my remark of 12 May 2009. This would need some version of Simple9 and I'm still 
pondering a generalization of that, but I have no time plan for finishing it.
A rough implementation might just use vByte for such patches.


> PFOR implementation
> ---
>
> Key: LUCENE-1410
> URL: https://issues.apache.org/jira/browse/LUCENE-1410
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: autogen.tgz, for-summary.txt, 
> LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410b.patch, 
> LUCENE-1410c.patch, LUCENE-1410d.patch, LUCENE-1410e.patch, 
> TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, TestPFor2.java
>
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
>
> Implementation of Patched Frame of Reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org