HI ,

    We have been observing the following problem while tokenizing using
lucene's StandardAnalyzer. Tokens that we get is different on different
machines. I am suspecting it has something to do with the Locale settings on
individual machines?

For example
the word 'CÃ(c)sar'   is split as  'CÃ(c)sar'   on machine 1

while it is split into [cã, sar]  on machine 2 .

Could someone please tell me what might be going on?

Thx
PM

Reply via email to