Well - with regards to number 2. It was fine to dig into the code a bit - but I guess we have them a number of them already converted, although I guess never added source control. Thanks for the heads up on 1 and 3. ~P
---------------------------------------- > From: digyd...@gmail.com > To: lucene-net-dev@lucene.apache.org > Subject: RE: Umlauts as Char > Date: Tue, 8 Feb 2011 11:12:33 +0200 > > Hi Prescott, > > 1- When I open the java file, I see the code as it should be. You can try to > open it with notepad and then paste to VS for ex. > 2- There is an open issue reported by Pasha Bizhan that covers some > languages (https://issues.apache.org/jira/browse/LUCENENET-372) > But I don't know it us up to date or not. > 3- ASCIIFoldingFilter.cs is another example for dealing with non-ascii > chars. > > DIGY > > -----Original Message----- > From: Prescott Nasser [mailto:geobmx...@hotmail.com] > Sent: Tuesday, February 08, 2011 3:55 AM > To: lucene-net-dev@lucene.apache.org > Subject: Umlauts as Char > > > > Hey all, > > So while digging into the code a bit (and pushed by digy's Arabic conversion > yesterday). I started looking at the various other languages we were missing > from java. > > I started porting the GermanAnalyzer, but ran into an issue of the > Umlauts... > > http://svn.apache.org/viewvc/lucene/java/tags/lucene_2_9_4/contrib/analyzers > /common/src/java/org/apache/lucene/analysis/de/GermanStemmer.java?revision=1 > 040993&view=co > > in the void subsitute function you'll see them: > > else if ( buffer.charAt( c ) == 'ü' ) { > buffer.setCharAt( c, 'u' ); > } > > This does not constitue a character in .net (that I can figure out) and thus > it doesn't compile. The .java file says encoded in UTF-8. I was thinking > maybe I could do the same thing in VS2010, but I'm not finding a way, and > searching on this has been difficult. > > Any ideas? > > ~Prescott = >