Is there any (preliminary) code checked in somewhere that I can look at, that would help me understand the practical issues that would need to be addressed?
If I understand you correctly, it's a little different from what's happening in your blog posts: http://blog.mikemccandless.com/2012/07/building-new-lucene-postings-format.h tml http://blog.mikemccandless.com/2012/08/lucenes-new-blockpostingsformat-thank s.html Those posts deal with making your own codec, but not about changing what's stored in the postings? I guess I misunderstood "postings format" before. stephen > Flexible indexing is the ability to make your own codec, which > controls the reading and writing of all index parts (postings, stored > fields, term vectors, deleted docs, etc.). > > So for example if you want to store some postings as a bit set instead > of the block format that's the default coming up in 4.1, that's easy > to do. > > But what is less easy (as I described below) is changing what is > actually stored in the postings, eg adding a new per-position > attribute. > > The original goal was to allow arbitrary attributes beyond the known > docs/freqs/positions/offsets that Lucene supports today, so that you > could easily make new application-dependent per-term, per-doc, > per-position things, pull them from the analyzer, save them to the > index, and access them from an IndexReader / query, but while some > APIs do expose this, it's not very well explored yet (eg, you'd have > to make a custom indexing chain to get the attributes "through" > IndexWriter down to your codec). It would be great to make progress > making this easier, so ideas are very welcome :) > > Mike McCandless > > http://blog.mikemccandless.com > > On Tue, Nov 27, 2012 at 3:37 PM, Wu, Stephen T., Ph.D. > <wu.step...@mayo.edu> wrote: >> Following up on a previous question... >> What is "flexible indexing" in Lucene 4.0? We assumed it was the ability to >> easily make new postings formats/codecs -- but a response below says that >> would be "tricky"? >> >> stephen >> >> >> On 11/27/12 11:48 AM, "David Causse" <dcau...@spotter.com> wrote: >> >>> Hi, >>> >>> We use payloads but we can't use the whole lucene API. >>> For example we use it to do some relation query for example : >>> >>> @quote(@speaker(obama) @discourse(health)) >>> >>> Search for all documents that contains a quote by Obama talking about >>> health. >>> We encode linguistic informations (standoff annotations) inside payloads >>> and use custom search API to query the index. >>> I didn't found a convenable way to attach my code to lucene >>> Query/Scorer/Weight API. Like SpanQuery you have to rewrite the whole >>> Query stack. >>> In short if you want to go with Payloads that do more than boosting a >>> term there's chances that you'll need to rewrite a big part of the query >>> stack. >>> >>> >>> Le 27/11/2012 16:59, Wu, Stephen T., Ph.D. a écrit : >>>> I think we're looking at doing something related. I haven't explored the >>>> Enums or know how to make a postings codec... But what is "flexible >>>> indexing" in Lucene 4.0 if it's not the ability to make new postings >>>> codecs? >>>> >>>> We're trying to incorporate attributes onto terms/spans in indexes. We'd >>>> also like to try out some interesting ways to score things that go beyond >>>> just tokens. >>>> >>>> We were considering using Attributes instead of Payloads, because it seems >>>> like using Payloads ties you to a particular kind of scoring -- just a >>>> weight on a token. Can Payloads be used for more general scoring >>>> functions? >>>> E.g., considering a span of text alongside multiple Payloads? >>>> >>>> Does it make sense to move outside of Payloads here? >>>> >>>> Thanks! >>>> >>>> stephen >>>> >>>> >>>> >>>> >>>> On 11/19/12 8:14 AM, "Michael McCandless" <luc...@mikemccandless.com> >>>> wrote: >>>> >>>>> A new postings format would be tricky because you have new attributes >>>>> you want to index. >>>>> >>>>> The DocsAndPositionsEnum does have an attributes source, but this is >>>>> not well explored, and there are known problems (they can't be easily >>>>> merged in the composite reader case). >>>>> >>>>> So that's why I suggested packing your information into a payload ... >>>>> >>>>> Mike McCandless >>>>> >>>>> http://blog.mikemccandless.com >>>>> >>>>> On Sun, Nov 18, 2012 at 8:33 PM, wgggfiy <wuqiu....@qq.com> wrote: >>>>>> thx, mike. >>>>>> about the 3th question, "encode them all into the payload" is better than >>>>>> "a new postings format with the codec" ?? >>>>>> I mean replace the orginal posting item (position, startOffset, >>>>>> endOffset, >>>>>> payload) with my own inverted item such as >>>>>> class TestPostingItem >>>>>> { >>>>>> int termId; >>>>>> long startOffset; >>>>>> long endOffset; >>>>>> float score; >>>>>> int segId; >>>>>> long timeStamp; >>>>>> } >>>>>> ? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://lucene.472066.n3.nabble.com/what-is-the-offsets-and-payload-in-Doc >>>>>> sA >>>>>> nd >>>>>> PositionsEnum-for-tp4020933p4020968.html >>>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org