Altough Java doesn't write BOM, VS is clever enough to open it correctly.
The problem probably is that Apache server sends the java code using "Content-Type: text/plain; charset=ISO-8859-1" and the receiver (possibly a browser) incorrectly tries to convert UTF-8 to ISO-8859-1. Using a svn client to download the code is a solution. DIGY -----Original Message----- From: Stefan Bodewig [mailto:bode...@apache.org] Sent: Tuesday, February 08, 2011 7:10 AM To: lucene-net-dev@lucene.apache.org Subject: Re: Umlauts as Char On 2011-02-08, Prescott Nasser wrote: > in the void subsitute function you'll see them: > else if ( buffer.charAt( c ) == 'ü' ) { > buffer.setCharAt( c, 'u' ); > } > This does not constitue a character in .net (that I can figure out) > and thus it doesn't compile. The .java file says encoded in UTF-8. I > was thinking maybe I could do the same thing in VS2010, but I'm not > finding a way, and searching on this has been difficult. IIRC VS will recognize UTF-8 encoded files if they start with a byte order mark (BOM) but Java usually doesn't write one. I think I once found the setting for reading/writing UTF-8 in VS, will need to search for it when at work. If you have a JDK installed you can use its native2ascii tool that can be used to replace non-ASCII characters with Unicoce escape sequences that you can then use in C# as well (see Nicolas' post). If you have Ant installed (sorry, can't resist ;-) you can convert the whole tree in one (untested) go with something like <copy todir="will-hold-translated-files" encoding="utf8"> <fileset dir="holds-original-files"/> <escapeunicode/> </copy> Stefan