Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

Carsten Schnober Thu, 13 Dec 2012 05:33:26 -0800

Am 13.12.2012 12:27, schrieb Michael McCandless:

>> For example:
>>  - part of speech of a token.
>>  - syntactic parse subtree (over a span).
>>  - semantically normalized phrase (to canonical text or ontological code).
>>  - semantic group (of a span).
>>  - coreference link.
> 
> So for example part-of-speech is a per-Token-position attribute.
> 
> Today the easiest way to handle this is to encode these attributes
> into a Payload, which is straightforward (make a custom TokenFilter
> that creates the payload).
> 
> At search time you would then use e.g. PayloadTermQuery to decode the
> Payload and do something with it to alter how the query is being
> scored.


This is a relatively easy example, but how would deal with e.g.
annotations that include multiple tokens (as in spans), such as chunks,
or relations between tokens (and token spans), as in the coreference
links example given by Steven above?
Best,
Carsten

-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schno...@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: What is "flexible indexing" in Lucene 4.0 if it's not the ability to make new postings codecs?

Reply via email to