[ https://issues.apache.org/jira/browse/LUCENE-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990543#comment-12990543 ]
Robert Muir commented on LUCENE-2886: ------------------------------------- Thanks for those numbers Renaud... yes the cases you see in e.g. Geonames was one example of what I was thinking: in general you might say someone should be using "omitTFAP" to omit freqs and positions for these fields, but they might not be able to do this, if they want to support e.g. phrase queries like "washington hill". So if we can pack long streams of 1s with freqs and positions I think this is probably useful for a lot of people. Additionally for the .doc, i see its smaller in the AFOR-3 case too. Is your "Ent" basically a measure of doc deltas? I'm confused exactly what it is :) Because I would think if you take e.g. Geonames, the place names in the dataset are not in random order but actually "batched" by country for example, so you would have long streams of docdelta=1 for country=Germany's postings. I'm not saying we could rely upon this, but i do think in general lots of people's docs aren't in completely random order, and its probably common to see long streams of docdelta=1 in structured data for this reason? > Adaptive Frame Of Reference > ---------------------------- > > Key: LUCENE-2886 > URL: https://issues.apache.org/jira/browse/LUCENE-2886 > Project: Lucene - Java > Issue Type: New Feature > Components: Codecs > Reporter: Renaud Delbru > Fix For: 4.0 > > Attachments: LUCENE-2886_simple64.patch, > LUCENE-2886_simple64_varint.patch, lucene-afor.tar.gz > > > We could test the implementation of the Adaptive Frame Of Reference [1] on > the lucene-4.0 branch. > I am providing the source code of its implementation. Some work needs to be > done, as this implementation is working on the old lucene-1458 branch. > I will attach a tarball containing a running version (with tests) of the AFOR > implementation, as well as the implementations of PFOR and of Simple64 > (simple family codec working on 64bits word) that has been used in the > experiments in [1]. > [1] http://www.deri.ie/fileadmin/documents/deri-tr-afor.pdf -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org