Lucene standard analyzer internationalization

Prashant Malik Tue, 22 Apr 2008 11:45:17 -0700

HI ,

    We have been observing the following problem while tokenizing using
lucene's StandardAnalyzer. Tokens that we get is different on different
machines. I am suspecting it has something to do with the Locale settings on
individual machines?


For example
the word 'CÃ(c)sar'   is split as  'CÃ(c)sar'   on machine 1

while it is split into [cã, sar]  on machine 2 .

Could someone please tell me what might be going on?

Thx
PM

Lucene standard analyzer internationalization

Reply via email to