[ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431193#comment-13431193 ]
Robert Muir commented on LUCENE-3892: ------------------------------------- {quote} So ... most of the gains come from BlockPF cutover. This is sort of ... surprising/disappointing, ie, our bottlenecks are the abstraction layers, not the actual decode cost. Still it's good to make progress on removing the abstractions. {quote} I don't think its that disappointing. This isnt a very interesting benchmark for a compression algorithm like FOR: instead imagine the very common case of apps today indexing small fields like product names, restaurant names, or something like that. Freqs are nearly always 1, and positions are tiny, but often people still want the ability to use things like phrase queries. And imagine cases where people are indexing data from a database and there are only a few unique values (e.g. product type = tshirt, pants, shoes) in a field. I think the wikipedia benchmark doesn't do a very good job of illustrating performance on use-cases like this, which I think are common and also where I'm fairly positive FOR will be a win. Its nice that its not slower or too much bigger in the "worst case" of large docs where the numbers aren't so tiny? {quote} Also, it looks like the only query that is slower than Lucene40 is AndHighLow ... however, it's also an extremely fast query to begin with so I think it's a fine tradeoff that it gets slower while the hard/slower queries get faster. {quote} +1, lets not even think twice about that one. > Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, > Simple9/16/64, etc.) > ------------------------------------------------------------------------------------- > > Key: LUCENE-3892 > URL: https://issues.apache.org/jira/browse/LUCENE-3892 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Labels: gsoc2012, lucene-gsoc-12 > Fix For: 4.1 > > Attachments: LUCENE-3892-BlockTermScorer.patch, > LUCENE-3892-blockFor&hardcode(base).patch, > LUCENE-3892-blockFor&packedecoder(comp).patch, > LUCENE-3892-blockFor-with-packedints-decoder.patch, > LUCENE-3892-blockFor-with-packedints-decoder.patch, > LUCENE-3892-blockFor-with-packedints.patch, LUCENE-3892-bulkVInt.patch, > LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, > LUCENE-3892-handle_open_files.patch, > LUCENE-3892-pfor-compress-iterate-numbits.patch, > LUCENE-3892-pfor-compress-slow-estimate.patch, LUCENE-3892_for_byte[].patch, > LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch, > LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, > LUCENE-3892_settings.patch, LUCENE-3892_settings.patch > > > On the flex branch we explored a number of possible intblock > encodings, but for whatever reason never brought them to completion. > There are still a number of issues opened with patches in different > states. > Initial results (based on prototype) were excellent (see > http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html > ). > I think this would make a good GSoC project. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org