Hi Prescott, 1- When I open the java file, I see the code as it should be. You can try to open it with notepad and then paste to VS for ex. 2- There is an open issue reported by Pasha Bizhan that covers some languages (https://issues.apache.org/jira/browse/LUCENENET-372) But I don't know it us up to date or not. 3- ASCIIFoldingFilter.cs is another example for dealing with non-ascii chars.
DIGY -----Original Message----- From: Prescott Nasser [mailto:geobmx...@hotmail.com] Sent: Tuesday, February 08, 2011 3:55 AM To: lucene-net-dev@lucene.apache.org Subject: Umlauts as Char Hey all, So while digging into the code a bit (and pushed by digy's Arabic conversion yesterday). I started looking at the various other languages we were missing from java. I started porting the GermanAnalyzer, but ran into an issue of the Umlauts... http://svn.apache.org/viewvc/lucene/java/tags/lucene_2_9_4/contrib/analyzers /common/src/java/org/apache/lucene/analysis/de/GermanStemmer.java?revision=1 040993&view=co in the void subsitute function you'll see them: else if ( buffer.charAt( c ) == 'ü' ) { buffer.setCharAt( c, 'u' ); } This does not constitue a character in .net (that I can figure out) and thus it doesn't compile. The .java file says encoded in UTF-8. I was thinking maybe I could do the same thing in VS2010, but I'm not finding a way, and searching on this has been difficult. Any ideas? ~Prescott =