[ 
https://issues.apache.org/jira/browse/LUCENE-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990509#comment-12990509
 ] 

Renaud Delbru edited comment on LUCENE-2886 at 2/4/11 10:42 AM:
----------------------------------------------------------------

Hi Michael, Robert,
great to hear that the code is useful, looking forward to see some benchmark.
I think the VarIntBlock approach is a good idea. Concerning the two unused 
"frame" codes, it will not cost too much to add them. This might be useful for 
the frequency inverted lists. However, I am not sure they will be used that 
much. In our experiments, we had a version of AFOR allowing frames of size 8, 
16 and 32 integers with allOnes and allZeros. The gain was very minimal, in the 
order to 0.x% index size reduction, because these cases were occurring very 
rarely. But, this is still better than nothing. However, in the case of 
simple64, we are not talking about small frame (up to 32 integers), but frame 
of 120 to 240 integers. Therefore, I expect to see a drop of probability to 
encounter 120 or 240 consecutive ones. Maybe we can use them for more clever 
configurations such as
- inter-leaved sequences of 1 bit and 2 bits integers
- inter-leaved sequences of 2 bits and 3 bits integers
or something like this.
The best will be to do some tests to see which new configurations will make 
sense, like how many times a allOnes config is selected, or other configs, and 
choose which one to add. But this can be tedious task with only a limited 
benefit.

      was (Author: renaud.delbru):
    Hi Michael, Robert,
great to hear that the code is useful, looking forward to see some benchmark.
I think the VarIntBlock approach is a good idea. Concerning the two unused 
"frame" codes, it will not cost too much to add them. This might be useful for 
the frequency inverted lists. However, I am not sure they will be used that 
much. In our experiments, we had a version of AFOR allowing frames of size 8, 
16 and 32 integers with allOnes and allZeros. The gain was very minimal, in the 
order to 0.x% index size reduction, because these cases were occurring very 
rarely. But, this is still better than nothing. However, in the case of 
simple64, we are not talking about small frame (up to 32 integers), but frame 
of 120 to 240 integers. Therefore, I expect to see a drop of probability to 
encounter 120 or 240 consecutive ones. Maybe we can use them for more clever 
configurations such as
- inter-leaved sequences of 1 bit and 2 bits integers
- inter-leaved sequences of 2 bits and 3 bits integers
or something like this.
The best will be to do some tests to see which new configurations will make 
sense, like how many times a allOnes config is selected, or other configs, and 
choose which one to add.
  
> Adaptive Frame Of Reference 
> ----------------------------
>
>                 Key: LUCENE-2886
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2886
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Codecs
>            Reporter: Renaud Delbru
>             Fix For: 4.0
>
>         Attachments: LUCENE-2886_simple64.patch, 
> LUCENE-2886_simple64_varint.patch, lucene-afor.tar.gz
>
>
> We could test the implementation of the Adaptive Frame Of Reference [1] on 
> the lucene-4.0 branch.
> I am providing the source code of its implementation. Some work needs to be 
> done, as this implementation is working on the old lucene-1458 branch. 
> I will attach a tarball containing a running version (with tests) of the AFOR 
> implementation, as well as the implementations of PFOR and of Simple64 
> (simple family codec working on 64bits word) that has been used in the 
> experiments in [1].
> [1] http://www.deri.ie/fileadmin/documents/deri-tr-afor.pdf

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to