[ 
https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413698#comment-13413698
 ] 

Michael McCandless commented on LUCENE-3892:
--------------------------------------------

bq. But why? Does lucene store freq() when it is 0 as well, so a whole block 
with v==1 will be more possible?

A whole block of 1s can easily happen: if all freqs are one (the term always 
occurred only once in each document), or if the term occurs in every document 
than the delta between docIDs is always 1.

I don't think we should ever hit an all 0s block today (hmm: except for 
positions, if the given term always occurred at the first position in each doc).

We could in theory subtract 1 from all these deltas (except the first one!  so 
maybe we add one to the docID to begin with...) so that these turn into all 0s 
blocks, but then at decode time we'd have to add 1 back and I'm not sure that'd 
net/net be a win.
                
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, 
> Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892-BlockTermScorer.patch, 
> LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, 
> LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-for&pfor.patch, 
> LUCENE-3892-handle_open_files.patch, LUCENE-3892_for.patch, 
> LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, 
> LUCENE-3892_for_unfold_method.patch, LUCENE-3892_pfor.patch, 
> LUCENE-3892_pfor.patch, LUCENE-3892_pfor.patch, 
> LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, 
> LUCENE-3892_settings.patch, LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to