[jira] [Commented] (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237893#comment-13237893 ] Michael McCandless commented on LUCENE-1410: bq. Out of curiousity, is the PFOR effort dead? Nothing in open source is ever dead! (Well, rarely...). It's just that nobody has picked this up again and pushed it to a committable state. I think now that we have no more bulk API in trunk, it may not be that much work to finish... though there could easily be surprises. I opened LUCENE-3892 to do exactly this, as a Google Summer of Code project. > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: core/index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, > LUCENE-1410c.patch, LUCENE-1410d.patch, LUCENE-1410e.patch, > TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, TestPFor2.java, > autogen.tgz, for-summary.txt > > Original Estimate: 21,840h > Remaining Estimate: 21,840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237604#comment-13237604 ] The Alchemist commented on LUCENE-1410: --- Out of curiousity, is the PFOR effort dead? I was thinking about running some newer benchmarks using Java 7, and see if that makes a difference. Do you guys think that's worthwhile? > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: core/index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, > LUCENE-1410c.patch, LUCENE-1410d.patch, LUCENE-1410e.patch, > TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, TestPFor2.java, > autogen.tgz, for-summary.txt > > Original Estimate: 21,840h > Remaining Estimate: 21,840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974901#action_12974901 ] Paul Elschot commented on LUCENE-1410: -- bq. .. we might be able to have some gains by allowing a directory to return an IntBufferIndexInput of some sort (separate from DataInput/IndexInput) that basically just positions an IntBuffer view (the default implementation would fill from an indexinput into a bytebuffer like we do now), Since things are moving on the nio buffer front (see also LUCENE-2292), how about trying to be independent from the buffer implementation? That might be done by allowing an IntBuffer wrapping a byte array or as view alongside a ByteBuffer, or temporary IntBuffer as above. To be independent from the buffer implementation we could add some methods to IndexInput: void startAlignToInt() // basically a seek to the next multiple of 4 byte when not already there. Could also start using an IntBuffer somehow. int readAlignedInt() // get the next int, default to readInt(), use an IntBuffer when available. void endAlignToInt() // switch back to byte reading, set the byte buffer to the the position corresponding to the int buffer. (Adding this to DataInput seems to be a more natural place, but DataInput cannot seek.) Would that work, and could it work fast? > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974765#action_12974765 ] Paul Elschot commented on LUCENE-1410: -- I tried to revive the tests from the 1410e patch, but it does not make much sense because they test rather short sequences of input to be compressed, and decompressor is now hardcoded to always decompress 128 values. > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974315#action_12974315 ] Robert Muir commented on LUCENE-1410: - {quote} Did you also test without a copy (without the readbytes() call) into the underlying byte array for the IntBuffer? That might be even faster, and it could be possible when using for example a BufferedIndexInput or an MMapDirectory. For decent buffer.get() speed the starting byte would need to be aligned at an int border. {quote} Yes, for the mmap case I tried the original dangerous hack, exposing in Intbuffer view of its internal mapped byte buffer. I also tried mmapindexinput keeping track of its own intbuffer view. we might be able to have some gains by allowing a directory to return an IntBufferIndexInput of some sort (separate from DataInput/IndexInput) that basically just positions an IntBuffer view (the default implementation would fill from an indexinput into a bytebuffer like we do now), but I haven't tested this across all the directories yet... it might help NIOFS though as it would bypass the double-buffering of BufferedIndexInput. For SimpleFS it would be the same, and for MMap i'm not very hopeful it would be better, but maybe not worse. if that worked maybe we could do the same with Long, for things like simple-8b (http://onlinelibrary.wiley.com/doi/10.1002/spe.948/abstract) > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974307#action_12974307 ] Paul Elschot commented on LUCENE-1410: -- bq. I've tested everything I can think of and it seems this nio ByteBuffer/IntBuffer approach is always the fastest ... Did you also test without a copy (without the readbytes() call) into the underlying byte array for the IntBuffer? That might be even faster, and it could be possible when using for example a BufferedIndexInput or an MMapDirectory. For decent buffer.get() speed the starting byte would need to be aligned at an int border. > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974303#action_12974303 ] Paul Elschot commented on LUCENE-1410: -- bq. ... is it possible to encode # of exception bytes in header? In the first implementation the start index of the exception chain is in the header (5 or 6 bits iirc). In the second implementation (by Hoa Yan) there is no exception chain, so the number of exceptions must somehow be encoded in the header. That means encoding the # exception bytes in the header would be easier in the second implementation, but it is also possible in the first one. I would expect that a few bits for the number of encoded integers would also be added in the header (think 32, 64, 128...). The number of frame bits takes 5 bits. That means that there are about 2 bytes unused in the header now, and I'd expect 1 byte to be enough to encode the number of bytes for the exceptions. For example a bad case in the first implementation of 10 exceptions of 4 bytes means 40 bytes data, that fits in 6 bits, the same bad case in the second implementation would also need to store the indexes of the exceptions in 10*5 bits, totalling 90 bytes that can be encoded in 7 bits. However, I don't know what the worst case # exceptions is. (This gets into vsencoding...) For the moment I'll just leave this unchanged and get the tests working on the current first implementation. > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973927#action_12973927 ] Robert Muir commented on LUCENE-1410: - Sorry, correction (i meant the length in bytes or ints *compressed*, to tell us how many bytes to read) In the FOR case we now do: {noformat} int header = in.readInt(); final int numFrameBits = ((header >>> 8) & 31) + 1; in.readBytes(input, 0, numFrameBits << 4); {noformat} But in PFOR we still have "two headers" {noformat} int numBytes = in.readInt(); // nocommit: is it possible to encode # of exception bytes in header? in.readBytes(input, 0, numBytes); compressedBuffer.rewind(); int header = compressedBuffer.get(); {noformat} > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973914#action_12973914 ] Robert Muir commented on LUCENE-1410: - {quote} I'm running into a nocommit for the nio byte buffer allocation in ForDecompress.java. Shall I try and move the buffer handling from there into FORIndexInput and PForDeltaIndexInput at the codecs? {quote} I am to blame for this I think! Actually I think the buffer handling could stay and we could just remove the nocommit? I've tested everything I can think of and it seems this nio ByteBuffer/IntBuffer approach is always the fastest: its only slower to do it other ways, and it doesnt help to do trickier things like IntBuffer views of MMap even. One thing that would be good, is it possible to encode the length in decompressed bytes (or the length in bytes of exceptions) into PFOR's int header? this would allow us to remove the wasted per-block int that we currently encode now. Then we could "put FOR and PFOR back together" again... sorry i split apart the decompressors to remove the wasted int in the FOR case since we can get it from its header already. > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973897#action_12973897 ] Paul Elschot commented on LUCENE-1410: -- I'm running into a nocommit for the nio byte buffer allocation in ForDecompress.java. Shall I try and move the buffer handling from there into FORIndexInput and PForDeltaIndexInput at the codecs? I could leave it as it is, but then the test cases from the 1410e patch would have to be adapted again when the nocommit is fixed. Also the package/directory naming o.a.l.util.pfor and o.a.l.index.codecs.pfordelta may be confusing. Probably pfordelta could could be renamed to pfor, since delta refers to differences (in docids and positions) that are treated elsewhere. But I'd rather not change that now. > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973830#action_12973830 ] Michael McCandless commented on LUCENE-1410: Yes positions are bulk coded too, but we haven't cutover any positional queries yet to use the bulk enum API... we should cutover at least one (I think exact PhraseQuery is probably easiest!). > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973827#action_12973827 ] Paul Elschot commented on LUCENE-1410: -- I had a quick look at the codecs for this, but I couldn't find the answer to this question easily: Are the positions here also encoded by the bulk int encoders (VInt and FOR)? > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973790#action_12973790 ] Robert Muir commented on LUCENE-1410: - On LUCENE-2723, I uploaded a "bulk vint" codec that shares most of the same codepath as FOR/PFOR, except it writes blocks of 128 vint-encoded integers. There are performance numbers there compared to our Standard vint-based codec, as you can see it differs dramatically due to other reasons. So I thought it would be useful to then compare FOR to this, since its a good measure of just the compression algorithm, but everything else is the same (comparing two 128-block size FixedIntBlock codecs with the same index layout, etc etc). This way we compare apples to apples. ||Query||QPS BulkVInt||QPS FOR||Pct diff |united~1.0|9.43|9.39|{color:red}-0.5%{color}| |united~2.0|2.02|2.02|{color:red}-0.3%{color}| |unit~1.0|6.37|6.36|{color:red}-0.1%{color}| |unit~2.0|6.13|6.21|{color:green}1.2%{color}| |"unit state"~3|3.45|3.51|{color:green}2.0%{color}| |spanNear([unit, state], 10, true)|2.89|2.99|{color:green}3.3%{color}| |unit*|30.04|31.42|{color:green}4.6%{color}| |unit state|8.00|8.40|{color:green}5.0%{color}| |"unit state"|5.97|6.37|{color:green}6.7%{color}| |spanFirst(unit, 5)|11.29|12.10|{color:green}7.2%{color}| |uni*|17.36|18.69|{color:green}7.6%{color}| |+unit +state|10.99|12.18|{color:green}10.8%{color}| |+nebraska +state|65.74|73.06|{color:green}11.1%{color}| |state|28.90|32.37|{color:green}12.0%{color}| |u*d|10.54|12.45|{color:green}18.1%{color}| |un*d|40.06|47.61|{color:green}18.9%{color}| > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973733#action_12973733 ] Paul Elschot commented on LUCENE-1410: -- No need to be sorry. Thanks for taking this on. > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973593#action_12973593 ] Michael McCandless commented on LUCENE-1410: bq. In the 1410e patch here are test cases that have not made into the bulkpostings branch. I'll try and revive these first. Ugh, sorry :( Thanks! > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973286#action_12973286 ] Paul Elschot commented on LUCENE-1410: -- In the 1410e patch here are test cases that have not made into the bulkpostings branch. I'll try and revive these first. > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973161#action_12973161 ] Michael McCandless commented on LUCENE-1410: OK I committed "pfor2" onto the branch. I also add a new low-level "encode random ints" tests. pfor1 passes the test but pfor2 fails it (I'm guessing this is the 2^28 limitation of Simple16, but I'm really not sure), so I left the pfor2 random ints test @Ignore for now... > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973159#action_12973159 ] Michael McCandless commented on LUCENE-1410: bq. A simple solution is to treat the high bit of the 28 bit value just the same as in vByte, and allow a vByte to follow in the 28 bit case. The high bit can also be added to the selector easily to avoid testing for it. That sounds great! Any chance you could fix this up on one of the Simple16 impls? I'd really like to have a Simple9/16 codec to better test our variable int block codec infrastructure... > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12973021#action_12973021 ] Paul Elschot commented on LUCENE-1410: -- bq. ... separately encode exc positions & values (high bits), leaving low bits in the slot. Does this give better perf than linking them together? (I think pfor1 links). This saves forced exceptions for low numbers of frame bits, so the treatment of exceptions is cleaner. bq. Simple16 cannot represent values >= 2^28? A simple solution is to treat the high bit of the 28 bit value just the same as in vByte, and allow a vByte to follow in the 28 bit case. The high bit can also be added to the selector easily to avoid testing for it. > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972991#action_12972991 ] Michael McCandless commented on LUCENE-1410: bq. Seems like we need to solve this with simple9/simple16 too? Yes! {quote} Like a random test that encodes/decodes a ton of integers (including things that would be rare deltas) via the codec API? {quote} I completely agree: we need heavy low-level tests for the int encoders... I'll stick a nocommit in when I commit! > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972990#action_12972990 ] Robert Muir commented on LUCENE-1410: - bq. How come bits cannot go up to 31? Or maybe you just use a full int if it's over 28? Seems like a good idea... Seems like we need to solve this with simple9/simple16 too? bq. Although all tests pass if I run w/ -Dtests.codec=PatchedFrameOfRef2, if I try to build a big wikipedia index I hit this: Mike, I've encountered this problem myself while messing with for/pfor. I know for these things we need low-level unit tests, but can we cheat in some way? Like a random test that encodes/decodes a ton of integers (including things that would be rare deltas) via the codec API? > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972988#action_12972988 ] Michael McCandless commented on LUCENE-1410: Also VSEncoding (http://puma.isti.cnr.it/publichtml/section_cnr_isti/cnr_isti_2010-TR-016.html) looks very interesting -- faster that PForDelta! > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, > LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, > TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971616#action_12971616 ] Michael McCandless commented on LUCENE-1410: OK I committed the prototype impl onto the bulk postings branch. > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Fix For: Bulk Postings branch > > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, > LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, > TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894827#action_12894827 ] Michael Busch commented on LUCENE-1410: --- Nice blog post, Mike! > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, > LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, > LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, > TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893831#action_12893831 ] Paul Elschot commented on LUCENE-1410: -- I'm sorry that there is no code yet for a better patching implementation, see my remark of 12 May 2009. This would need some version of Simple9 and I'm still pondering a generalization of that, but I have no time plan for finishing it. A rough implementation might just use vByte for such patches. > PFOR implementation > --- > > Key: LUCENE-1410 > URL: https://issues.apache.org/jira/browse/LUCENE-1410 > Project: Lucene - Java > Issue Type: New Feature > Components: Index >Reporter: Paul Elschot >Priority: Minor > Attachments: autogen.tgz, for-summary.txt, > LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410b.patch, > LUCENE-1410c.patch, LUCENE-1410d.patch, LUCENE-1410e.patch, > TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, TestPFor2.java > > Original Estimate: 21840h > Remaining Estimate: 21840h > > Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org