HI , We have been observing the following problem while tokenizing using lucene's StandardAnalyzer. Tokens that we get is different on different machines. I am suspecting it has something to do with the Locale settings on individual machines?
For example the word 'CÃ(c)sar' is split as 'CÃ(c)sar' on machine 1 while it is split into [cã, sar] on machine 2 . Could someone please tell me what might be going on? Thx PM