Following up on a previous question... What is "flexible indexing" in Lucene 4.0? We assumed it was the ability to easily make new postings formats/codecs -- but a response below says that would be "tricky"?
stephen On 11/27/12 11:48 AM, "David Causse" <dcau...@spotter.com> wrote: > Hi, > > We use payloads but we can't use the whole lucene API. > For example we use it to do some relation query for example : > > @quote(@speaker(obama) @discourse(health)) > > Search for all documents that contains a quote by Obama talking about > health. > We encode linguistic informations (standoff annotations) inside payloads > and use custom search API to query the index. > I didn't found a convenable way to attach my code to lucene > Query/Scorer/Weight API. Like SpanQuery you have to rewrite the whole > Query stack. > In short if you want to go with Payloads that do more than boosting a > term there's chances that you'll need to rewrite a big part of the query > stack. > > > Le 27/11/2012 16:59, Wu, Stephen T., Ph.D. a écrit : >> I think we're looking at doing something related. I haven't explored the >> Enums or know how to make a postings codec... But what is "flexible >> indexing" in Lucene 4.0 if it's not the ability to make new postings codecs? >> >> We're trying to incorporate attributes onto terms/spans in indexes. We'd >> also like to try out some interesting ways to score things that go beyond >> just tokens. >> >> We were considering using Attributes instead of Payloads, because it seems >> like using Payloads ties you to a particular kind of scoring -- just a >> weight on a token. Can Payloads be used for more general scoring functions? >> E.g., considering a span of text alongside multiple Payloads? >> >> Does it make sense to move outside of Payloads here? >> >> Thanks! >> >> stephen >> >> >> >> >> On 11/19/12 8:14 AM, "Michael McCandless" <luc...@mikemccandless.com> wrote: >> >>> A new postings format would be tricky because you have new attributes >>> you want to index. >>> >>> The DocsAndPositionsEnum does have an attributes source, but this is >>> not well explored, and there are known problems (they can't be easily >>> merged in the composite reader case). >>> >>> So that's why I suggested packing your information into a payload ... >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> On Sun, Nov 18, 2012 at 8:33 PM, wgggfiy <wuqiu....@qq.com> wrote: >>>> thx, mike. >>>> about the 3th question, "encode them all into the payload" is better than >>>> "a new postings format with the codec" ?? >>>> I mean replace the orginal posting item (position, startOffset, endOffset, >>>> payload) with my own inverted item such as >>>> class TestPostingItem >>>> { >>>> int termId; >>>> long startOffset; >>>> long endOffset; >>>> float score; >>>> int segId; >>>> long timeStamp; >>>> } >>>> ? >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://lucene.472066.n3.nabble.com/what-is-the-offsets-and-payload-in-DocsA >>>> nd >>>> PositionsEnum-for-tp4020933p4020968.html >>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org