[jira] Commented: (LUCENE-2723) Speed up Lucene's low level bulk postings read API

Simon Willnauer (JIRA) Wed, 26 Jan 2011 00:31:12 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986891#action_12986891
 ]


Simon Willnauer commented on LUCENE-2723:
-----------------------------------------

{quote}
I really wish we could debug whatever this performance problem is, just in case 
the bulk APIs
themselves need changing... a little concerned about them at the moment thats 
all...
not sure it should stand in the way of your patch, just saying I don't like the 
performance
regression
{quote}

yeah I agree - I think we should open a separate issue to figure out what the 
problem here is. Unrelated to this, the wrapper gives me the ability to test 
the bulks apis easily together with the enum API which is valuable in any case. 
I am opening a new issue and commit that latest patch to the branch with that 
wrapper moved to /src/test. We can still move it to /src/java later though.



> Speed up Lucene's low level bulk postings read API
> --------------------------------------------------
>
>                 Key: LUCENE-2723
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2723
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2723-BulkEnumWrapper.patch, 
> LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, 
> LUCENE-2723-termscorer.patch, LUCENE-2723.patch, LUCENE-2723.patch, 
> LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, 
> LUCENE-2723_bulkvint.patch, LUCENE-2723_facetPerSeg.patch, 
> LUCENE-2723_facetPerSeg.patch, LUCENE-2723_openEnum.patch, 
> LUCENE-2723_termscorer.patch, LUCENE-2723_wastedint.patch
>
>
> Spinoff from LUCENE-1410.
> The flex DocsEnum has a simple bulk-read API that reads the next chunk
> of docs/freqs.  But it's a poor fit for intblock codecs like FOR/PFOR
> (from LUCENE-1410).  This is not unlike sucking coffee through those
> tiny plastic coffee stirrers they hand out airplanes that,
> surprisingly, also happen to function as a straw.
> As a result we see no perf gain from using FOR/PFOR.
> I had hacked up a fix for this, described at in my blog post at
> http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
> I'm opening this issue to get that work to a committable point.
> So... I've worked out a new bulk-read API to address performance
> bottleneck.  It has some big changes over the current bulk-read API:
>   * You can now also bulk-read positions (but not payloads), but, I
>      have yet to cutover positional queries.
>   * The buffer contains doc deltas, not absolute values, for docIDs
>     and positions (freqs are absolute).
>   * Deleted docs are not filtered out.
>   * The doc & freq buffers need not be "aligned".  For fixed intblock
>     codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16,
>     Group varint, etc.) they won't be.
> It's still a work in progress...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2723) Speed up Lucene's low level bulk postings read API

Reply via email to