Hi Prashant, What is the Unicode code point associated with the 3,4,5 character?
Steve On 04/22/2008 at 4:45 PM, Prashant Malik wrote: > Yes the version of lucene and java are exactly the same on > the different > machines. > Infact we unjared lucene and jared it with our jar and are > running from the > same nfs mounts on both the machines > > Also we have tried with lucene2.2.0 and 2.3.1. with the same result . > > also about the actual string u have it right till 2 . > > 3,4,5 are a single character > > Thx > PM > > On Tue, Apr 22, 2008 at 12:01 PM, Steven A Rowe > <[EMAIL PROTECTED]> wrote: > > > Hi Prashant, > > > > On 04/22/2008 at 2:23 PM, Prashant Malik wrote: > > > We have been observing the following problem while > > > tokenizing using lucene's StandardAnalyzer. Tokens that we get is > > > different on different machines. I am suspecting it has something to > > > do with the Locale settings on individual machines? > > > > > > For example > > > the word 'CÃ(c)sar' is split as 'CÃ(c)sar' on machine 1 > > > > > > while it is split into [cã, sar] on machine 2 . > > > > > > Could someone please tell me what might be going on? > > > > Which version of Lucene are you using? Is it the same on both machines? > > > > I ask because Lucene recently switched StandardTokenizer lexer > > generation from JavaCC to JFlex, for performance reasons (increased > > throughput). > > > > Also, my email viewer displays the word in question as the following > > sequence of characters: > > > > 1. Capital "C" > > 2. Capital "A" with a tilda ("~") above it > > 3. Left parenthesis > > 4. Lowercase "c" > > 5. Right parenthesis > > 6. Lowercase "s" > > 7. Lowercase "a" > > 8. Lowercase "r" > > > > Is this the correct character sequence? (Sometimes UTF-8 can look > > similar to this when it's interpreted as Latin-1.) > > > > Steve > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] For > > additional commands, e-mail: [EMAIL PROTECTED] > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]