[ 
https://issues.apache.org/jira/browse/LUCENE-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990535#comment-12990535
 ] 

Renaud Delbru commented on LUCENE-2886:
---------------------------------------

{quote}
In the case of 240 1's, i was surprised to see this selector was used over 2% 
of the time
for the gov collection's doc file?
{quote}
our results were performed on the wikipedia dataset and blogs dataset. I don;t 
know what was our selection rate, I was just referring to the gain in overall 
compression rate.

{quote}
But still, for the all 1's case I'm not actually thinking about unstructured 
text so much...
in this case I am thinking about metadata fields and more structured data?
{quote}

Yes, this makes sense. In the context of SIREn (kind of simple xml node based 
inverted index) which is meant for indexing semi-structured data, the 
difference was more observable (mainly on the frequency and position files, as 
well as other structure node files).
This might be also useful on the document id file for very common terms (maybe 
for certain type of facets, with a very few number of values covering a large 
portion of the document collection).

> Adaptive Frame Of Reference 
> ----------------------------
>
>                 Key: LUCENE-2886
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2886
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Codecs
>            Reporter: Renaud Delbru
>             Fix For: 4.0
>
>         Attachments: LUCENE-2886_simple64.patch, 
> LUCENE-2886_simple64_varint.patch, lucene-afor.tar.gz
>
>
> We could test the implementation of the Adaptive Frame Of Reference [1] on 
> the lucene-4.0 branch.
> I am providing the source code of its implementation. Some work needs to be 
> done, as this implementation is working on the old lucene-1458 branch. 
> I will attach a tarball containing a running version (with tests) of the AFOR 
> implementation, as well as the implementations of PFOR and of Simple64 
> (simple family codec working on 64bits word) that has been used in the 
> experiments in [1].
> [1] http://www.deri.ie/fileadmin/documents/deri-tr-afor.pdf

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to