Hi Glen, I don't believe you can attach a single payload to multiple tokens. What I did for a similar requirement was to combine the tokens into a single "_" delimited single token and attached the payload to it. For example:
The Big Bad Wolf huffed and puffed and blew the house of the Three Little Pigs down. Now assume "Big Bad Wolf" and "Three Little Pigs" are spans to which I would like to attach payloads to. I run the tokens through a custom tokenizer that produces: The Big_Bad_Wolf$payload1 huffed and puffed and blew the house of the Three_Little_Pigs$payload2 down. In my case this makes sense, ie I can treat the span as a single unit. Not sure about your use case. HTH Sujit On Dec 13, 2012, at 2:08 PM, Glen Newton wrote: > Cool! Sounds great! :-) > > Any pointers to a (Lucene) example that attaches a payload to a > start..end span that is more than one token? > > thanks, > -Glen > > On Thu, Dec 13, 2012 at 5:03 PM, Lance Norskog <goks...@gmail.com> wrote: >> I should not have added that note. The Opennlp patch gives a concrete >> example of adding an annotation to text. >> >> >> On 12/13/2012 01:54 PM, Glen Newton wrote: >>> >>> It is not clear this is exactly what is needed/being discussed. >>> >>> From the issue: >>> "We are also planning a Tokenizer/TokenFilter that can put parts of >>> speech as either payloads (PartOfSpeechAttribute?) on a token or at >>> the same position." >>> >>> This adds it to a token, not a span. 'same position' does not suggest >>> it also records the end position. >>> >>> -Glen >>> >>> On Thu, Dec 13, 2012 at 4:45 PM, Lance Norskog <goks...@gmail.com> wrote: >>>> >>>> Parts-of-speech is available now, in the indexer. >>>> >>>> LUCENE-2899 adds OpenNLP to the Lucene&Solr codebase. It does >>>> parts-of-speech, chunking and Named Entity Recognition. OpenNLP is an >>>> Apache >>>> project for natural-language processing. >>>> >>>> Some parts are in Solr that could be in Lucene. >>>> >>>> https://issues.apache.org/jira/browse/lucene-2899 >>>> >>>> >>>> On 12/12/2012 12:02 PM, Wu, Stephen T., Ph.D. wrote: >>>>>>> >>>>>>> Is there any (preliminary) code checked in somewhere that I can look >>>>>>> at, >>>>>>> that would help me understand the practical issues that would need to >>>>>>> be >>>>>>> addressed? >>>>>> >>>>>> Maybe we can make this more concrete: what new attribute are you >>>>>> needing to record in the postings and access at search time? >>>>> >>>>> For example: >>>>> - part of speech of a token. >>>>> - syntactic parse subtree (over a span). >>>>> - semantically normalized phrase (to canonical text or ontological >>>>> code). >>>>> - semantic group (of a span). >>>>> - coreference link. >>>>> >>>>> stephen >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > > -- > - > http://zzzoot.blogspot.com/ > - > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org