Stefan somewhat nailed it on the head. My concerns where the java characters - 
I can't even search google or bing for them. So I can take the source codes 
word that 'ü' is the u with dots over it (becuase it says replace umlauts in 
the source notes). But, I guess, is that really true? Is that perhaps u with a 
carrot over it instead?
 
I'm tempted to take the source at it's word and just replace them with the 
umlauts versions (via character map -thanks Aaron), and then make some comment 
expressing what originally it was in the java source.
 
What are your guy's thoughts?
 
~P
 






----------------------------------------
> From: bode...@apache.org
> To: lucene-net-dev@lucene.apache.org
> Subject: Re: Umlauts as Char
> Date: Tue, 8 Feb 2011 06:01:27 +0100
>
> On 2011-02-08, Nicholas Paldino [.NET/C# MVP] wrote:
>
> > You can simply use the Unicode escape sequence in code and in
> > string/character literals, as specified by section 2.4.2 of the C# spec
> > (http://msdn.microsoft.com/en-us/library/aa664670(v=vs.71).aspx):
>
> I think in Prescott's case part of the problem is that he doesn't know
> which character the sequence seems to be. In this case it likely is an
> ü.
>
> > else if ( buffer.charAt( c ) == 'ü' ) {
> > buffer.setCharAt( c, 'u' );
> > }
>
> > Would become:
>
> > else if ( buffer.charAt( c ) == '\u00C3¼' ) {
> > buffer.setCharAt( c, 'u' );
> > }
>
> No. The two bytes are part of a two byte UTF-8 sequence making up a
> single character.
>
> Stefan                                          

Reply via email to