RE: Umlauts as Char

Digy Tue, 08 Feb 2011 01:13:34 -0800

Hi Prescott,

1- When I open the java file, I see the code as it should be. You can try to
open it with notepad and then paste to VS for ex.
2- There is an open issue reported by Pasha Bizhan that covers some
languages (https://issues.apache.org/jira/browse/LUCENENET-372)
But I don't know it us up to date or not.
3- ASCIIFoldingFilter.cs is another example for dealing with non-ascii
chars.

DIGY

-----Original Message-----
From: Prescott Nasser [mailto:[email protected]] 
Sent: Tuesday, February 08, 2011 3:55 AM
To: [email protected]
Subject: Umlauts as Char

Hey all, 

So while digging into the code a bit (and pushed by digy's Arabic conversion
yesterday). I started looking at the various other languages we were missing
from java.

I started porting the GermanAnalyzer, but ran into an issue of the
Umlauts...

http://svn.apache.org/viewvc/lucene/java/tags/lucene_2_9_4/contrib/analyzers
/common/src/java/org/apache/lucene/analysis/de/GermanStemmer.java?revision=1
040993&view=co

in the void subsitute function you'll see them:

        else if ( buffer.charAt( c ) == 'Ã¼' ) {
          buffer.setCharAt( c, 'u' );
        }

This does not constitue a character in .net (that I can figure out) and thus
it doesn't compile. The .java file says encoded in UTF-8. I was thinking
maybe I could do the same thing in VS2010, but I'm not finding a way, and
searching on this has been difficult.

Any ideas?

~Prescott                                         =

RE: Umlauts as Char

Reply via email to