Hi Chris-
> To my knoweldge, the character position of the tokens is not preserved by
> Lucene - only the ordinal postion of token's within a document / field is
> preserved.  Thus you need to store this character offset information
> separately, say, as Payload data.

Thanks for the information. So adding the OffsetAttribute at index time doesn't 
embed the offset information in the index - it just makes it available to the 
TokenFilter? I'll try adding the offset from the attribute to the payload..

In terms of getting access to the payloads is the best way to reconstruct the 
token stream (as the Highlighter does)? Or is than an easier way to just get 
access to the payloads?

Thanks,
-Chris


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to