Hi Prashant,

What is the Unicode code point associated with the 3,4,5 character?

Steve

On 04/22/2008 at 4:45 PM, Prashant Malik wrote:
> Yes the version of lucene and java are exactly the same on
> the different
> machines.
> Infact we unjared lucene and jared it with our jar and are
> running from the
> same nfs mounts on both the machines
> 
> Also we have tried with lucene2.2.0 and 2.3.1. with the same result .
> 
> also about the actual string u have it right till 2 .
> 
>         3,4,5 are a single character
> 
> Thx
> PM
> 
> On Tue, Apr 22, 2008 at 12:01 PM, Steven A Rowe
> <[EMAIL PROTECTED]> wrote:
> 
> > Hi Prashant,
> > 
> > On 04/22/2008 at 2:23 PM, Prashant Malik wrote:
> > >     We have been observing the following problem while
> > > tokenizing using lucene's StandardAnalyzer. Tokens that we get is
> > > different on different machines. I am suspecting it has something to
> > > do with the Locale settings on individual machines?
> > > 
> > > For example
> > > the word 'CÃ(c)sar'   is split as  'CÃ(c)sar'   on machine 1
> > > 
> > > while it is split into [cã, sar]  on machine 2 .
> > > 
> > > Could someone please tell me what might be going on?
> > 
> > Which version of Lucene are you using?  Is it the same on both machines?
> > 
> > I ask because Lucene recently switched StandardTokenizer lexer
> > generation from JavaCC to JFlex, for performance reasons (increased
> > throughput).
> > 
> > Also, my email viewer displays the word in question as the following
> > sequence of characters:
> > 
> >  1. Capital "C"
> >  2. Capital "A" with a tilda ("~") above it
> >  3. Left parenthesis
> >  4. Lowercase "c"
> >  5. Right parenthesis
> >  6. Lowercase "s"
> >  7. Lowercase "a"
> >  8. Lowercase "r"
> > 
> > Is this the correct character sequence? (Sometimes UTF-8 can look
> > similar to this when it's interpreted as Latin-1.)
> > 
> > Steve
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED] For
> > additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
>

 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to