Following up on a previous question...
What is "flexible indexing" in Lucene 4.0?  We assumed it was the ability to
easily make new postings formats/codecs -- but a response below says that
would be "tricky"?

stephen


On 11/27/12 11:48 AM, "David Causse" <dcau...@spotter.com> wrote:

> Hi,
> 
> We use payloads but we can't use the whole lucene API.
> For example we use it to do some relation query for example :
> 
> @quote(@speaker(obama) @discourse(health))
> 
> Search for all documents that contains a quote by Obama talking about
> health.
> We encode linguistic informations (standoff annotations) inside payloads
> and use custom search API to query the index.
> I didn't found a convenable way to attach my code to lucene
> Query/Scorer/Weight API. Like SpanQuery you have to rewrite the whole
> Query stack.
> In short if you want to go with Payloads that do more than boosting a
> term there's chances that you'll need to rewrite a big part of the query
> stack.
> 
> 
> Le 27/11/2012 16:59, Wu, Stephen T., Ph.D. a écrit :
>> I think we're looking at doing something related.  I haven't explored the
>> Enums or know how to make a postings codec... But what is "flexible
>> indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?
>> 
>> We're trying to incorporate attributes onto terms/spans in indexes.  We'd
>> also like to try out some interesting ways to score things that go beyond
>> just tokens.
>> 
>> We were considering using Attributes instead of Payloads, because it seems
>> like using Payloads ties you to a particular kind of scoring -- just a
>> weight on a token.  Can Payloads be used for more general scoring functions?
>> E.g., considering a span of text alongside multiple Payloads?
>> 
>> Does it make sense to move outside of Payloads here?
>> 
>> Thanks!
>> 
>> stephen
>> 
>> 
>> 
>> 
>> On 11/19/12 8:14 AM, "Michael McCandless" <luc...@mikemccandless.com> wrote:
>> 
>>> A new postings format would be tricky because you have new attributes
>>> you want to index.
>>> 
>>> The DocsAndPositionsEnum does have an attributes source, but this is
>>> not well explored, and there are known problems (they can't be easily
>>> merged in the composite reader case).
>>> 
>>> So that's why I suggested packing your information into a payload ...
>>> 
>>> Mike McCandless
>>> 
>>> http://blog.mikemccandless.com
>>> 
>>> On Sun, Nov 18, 2012 at 8:33 PM, wgggfiy <wuqiu....@qq.com> wrote:
>>>> thx, mike.
>>>> about the 3th question, "encode them all into the payload" is better than
>>>> "a new postings format with the codec" ??
>>>> I mean replace the orginal posting item (position, startOffset, endOffset,
>>>> payload) with my own inverted item such as
>>>> class TestPostingItem
>>>> {
>>>>          int termId;
>>>>          long startOffset;
>>>>          long endOffset;
>>>>          float score;
>>>>          int segId;
>>>>          long timeStamp;
>>>> }
>>>> ?
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> View this message in context:
>>>> http://lucene.472066.n3.nabble.com/what-is-the-offsets-and-payload-in-DocsA
>>>> nd
>>>> PositionsEnum-for-tp4020933p4020968.html
>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 
>> 
>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to