Out of curiosity, when you try your other test string "aaa _bbb ccc" what do the token byte offsets show?

Matt

Mark Harwood wrote:

On 1 Jul 2009, at 17:39, k.sayama wrote:

I could verify Token byte offsets

The sytsem outputs
aaa:0:3
bbb:0:3
ccc:4:7


That explains the highlighter behaviour. Clearly BBB is not at position 0-3 in the String you supplied

String CONTENTS = "AAA :BBB CCC";

Looks like the Tokenizer needs fixing. Is this yours or a standard Lucene class? If the latter, raising a JIRA bug with a Junit test would be the best way to get things moving.


Cheers
Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



--
Matthew Hall
Software Engineer
Mouse Genome Informatics
[email protected]
(207) 288-6012



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to