Alex...thanks Alex. Sorry, not sure why Aaron was in my head. ~P
---------------------------------------- > From: geobmx...@hotmail.com > To: lucene-net-dev@lucene.apache.org > Subject: RE: Umlauts as Char > Date: Mon, 7 Feb 2011 21:06:55 -0800 > > > Stefan somewhat nailed it on the head. My concerns where the java characters > - I can't even search google or bing for them. So I can take the source codes > word that 'ü' is the u with dots over it (becuase it says replace umlauts in > the source notes). But, I guess, is that really true? Is that perhaps u with > a carrot over it instead? > > I'm tempted to take the source at it's word and just replace them with the > umlauts versions (via character map -thanks Aaron), and then make some > comment expressing what originally it was in the java source. > > What are your guy's thoughts? > > ~P > > > > > > > > ---------------------------------------- > > From: bode...@apache.org > > To: lucene-net-dev@lucene.apache.org > > Subject: Re: Umlauts as Char > > Date: Tue, 8 Feb 2011 06:01:27 +0100 > > > > On 2011-02-08, Nicholas Paldino [.NET/C# MVP] wrote: > > > > > You can simply use the Unicode escape sequence in code and in > > > string/character literals, as specified by section 2.4.2 of the C# spec > > > (http://msdn.microsoft.com/en-us/library/aa664670(v=vs.71).aspx): > > > > I think in Prescott's case part of the problem is that he doesn't know > > which character the sequence seems to be. In this case it likely is an > > ü. > > > > > else if ( buffer.charAt( c ) == 'ü' ) { > > > buffer.setCharAt( c, 'u' ); > > > } > > > > > Would become: > > > > > else if ( buffer.charAt( c ) == '\u00C3¼' ) { > > > buffer.setCharAt( c, 'u' ); > > > } > > > > No. The two bytes are part of a two byte UTF-8 sequence making up a > > single character. > > > > Stefan