Highlighter apply to Japanese

Koji Sekiguchi Mon, 05 Sep 2005 19:27:00 -0700

Hi again,

I'm using highlighter to highlight terms in Japanese text,
but I cannot get preferable output.


If I use StandardAnalyzer or SnowballAnalyzer w/ English,
getBestFragment() returns preferable outputs:

Sample: (SnowballAnalyzer)
Text: A meeting will be held in the City Hall
TokenStream:
[a][meet][will][be][held][in][the][citi][hall]
Query Text: meet
Output: A <B>meeting</B> will be held in the City Hall

But if I use JapaneseAnalyzer, which is most popular Analyzer
in Japan to get TokenStream from Japanese text, to highlight
Japanese text with Highlighter, whole text is highlighted:

Sample: (JapaneseAnalyzer)
Text: AMeetingWillBeHeldInTheCityHall
TokenStream:
[A][Meeting][Will][Be][Held][In][The][City][Hall]
Query Text: Meeting
Output: <B>AMeetingWillBeHeldInTheCityHall</B>

Please note that I use alphabet to show the Text at second sample
because most users in this mailing list can read it, but in reality,
I used Japanese characters for the Text. And you'll see that
JapaneseAnalyzer,
which uses Japanese dictionary on background to extract tokens
from text stream, can recognize tokens and produce TokenStream.
But highlighter.getBestFragment() highlighted whole text.

Do I need to implement Fragmenter to highlight tokens correctly
for Japanese text?

Thanks in advance,

Koji




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Highlighter apply to Japanese

Reply via email to