Token offset values for custom Tokenizer

Shahan Khatchadourian Fri, 13 Jul 2007 09:43:31 -0700

Hi,

I am storing custom values in the Tokens provided by a Tokenizer butwhen retrieving them from the index the values don't match. I've lookedin the LIA book but it's not current since it mentioned term vectorsaren't stored. I'm using Lucene Nightly 146 but the same thing hashappened with older versions. Looking at the internals, DocumentWriterseems to keep track of the end offset that was placed into the index andmodifies the token values (with +1) but I'm not sure whether I should beconcerned with it.No existing analyzers are used when adding the document so all theoffsets are generated manually.

Any suggestions of how the token offsets should be stored?


Is this valid?
Token, start, end
aaa, 0, 3
bbb, 4, 7
ccc, 8, 11

Thanks,
Shahan

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Token offset values for custom Tokenizer

Reply via email to