I'm not sure why I didn't think about it - but there are tons of online converters...copy and paste ftw. I think the dealing with other character sets is new to me and complicated this more than it needed to be. Thanks guys, ~P
---------------------------------------- > From: bode...@apache.org > To: lucene-net-dev@lucene.apache.org > Subject: Re: Umlauts as Char > Date: Tue, 8 Feb 2011 06:09:58 +0100 > > On 2011-02-08, Prescott Nasser wrote: > > > in the void subsitute function you'll see them: > > > else if ( buffer.charAt( c ) == 'ü' ) { > > buffer.setCharAt( c, 'u' ); > > } > > > This does not constitue a character in .net (that I can figure out) > > and thus it doesn't compile. The .java file says encoded in UTF-8. I > > was thinking maybe I could do the same thing in VS2010, but I'm not > > finding a way, and searching on this has been difficult. > > IIRC VS will recognize UTF-8 encoded files if they start with a byte > order mark (BOM) but Java usually doesn't write one. I think I once > found the setting for reading/writing UTF-8 in VS, will need to search > for it when at work. > > If you have a JDK installed you can use its native2ascii tool that can > be used to replace non-ASCII characters with Unicoce escape sequences > that you can then use in C# as well (see Nicolas' post). > > If you have Ant installed (sorry, can't resist ;-) you can convert the > whole tree in one (untested) go with something like > > > encoding="utf8"> > > > > > Stefan