Hi,
I am storing custom values in the Tokens provided by a Tokenizer but when retrieving them from the index the values don't match. I've looked in the LIA book but it's not current since it mentioned term vectors aren't stored. I'm using Lucene Nightly 146 but the same thing has happened with older versions. Looking at the internals, DocumentWriter seems to keep track of the end offset that was placed into the index and modifies the token values (with +1) but I'm not sure whether I should be concerned with it. No existing analyzers are used when adding the document so all the offsets are generated manually.
Any suggestions of how the token offsets should be stored?

Is this valid?
Token, start, end
aaa, 0, 3
bbb, 4, 7
ccc, 8, 11

Thanks,
Shahan

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to