I added some code you advised and the result is as follows:
Text: AaaBCcDdEFGgHhIiJKkLMmN
Pos start end
Inc Ofst Ofst
[Aaa] 1 0 3
[B] 1 3 4
[Cc] 1 4 6
[Dd] 1 6 8
[E] 1 8 9
[F] 1 9 10
[Gg] 1 10 12
[Hh] 1 12 14
[Ii] 1 14 16
[J] 1 16 17
[Kk] 1 17 19
[L] 1 19 20
[Mm] 1 20 22
[N] 1 22 23
Output:
<B>AaaBCcDdEFGgHhIiJKkLMmN</B>
It seems JapaneseAnalyzer produces correct tokens
to me.
Any thoughts?
Koji
> -----Original Message-----
> From: markharw00d [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, September 06, 2005 3:37 PM
> To: [email protected]
> Subject: Re: Highlighter apply to Japanese
>
>
> I don't know the behaviour of the Japanese Analyzer you are using.
> Can you add to your example diagnosis the Token.getPositionIncrement,
> Token.startOffset and Token.endOffset for each of the tokens?
>
> The highlighter groups tokens with overlapping start and end offsets
> into a single TokenGroup for the purposes of highlighting.
> This allows
> TokenStreams which produce multiple synonyms for the same
> source token
> to work. This behaviour was also required to get the CJKAnalyzer to
> work. It could be that the Analyzer you are using is
> producing a stream
> of tokens which *all* overlap?
>
> Cheers
> Mark
>
>
>
> ___________________________________________________________
> To help you stay safe and secure online, we've developed the
> all new Yahoo! Security Centre. http://uk.security.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]