Altough Java doesn't write BOM, VS is clever enough to open it correctly.

 

The problem probably is that Apache server sends the java code using 
"Content-Type: text/plain; charset=ISO-8859-1" and the receiver (possibly a 
browser) incorrectly tries to convert UTF-8  to ISO-8859-1.

Using a svn client to download the code is a  solution.

 

DIGY

 

 

 

-----Original Message-----
From: Stefan Bodewig [mailto:bode...@apache.org] 
Sent: Tuesday, February 08, 2011 7:10 AM
To: lucene-net-dev@lucene.apache.org
Subject: Re: Umlauts as Char

 

On 2011-02-08, Prescott Nasser wrote:

 

> in the void subsitute function you'll see them:

 

>         else if ( buffer.charAt( c ) == 'ü' ) {

>           buffer.setCharAt( c, 'u' );

>         }

 

> This does not constitue a character in .net (that I can figure out)

> and thus it doesn't compile. The .java file says encoded in UTF-8. I

> was thinking maybe I could do the same thing in VS2010, but I'm not

> finding a way, and searching on this has been difficult.

 

IIRC VS will recognize UTF-8 encoded files if they start with a byte

order mark (BOM) but Java usually doesn't write one.  I think I once

found the setting for reading/writing UTF-8 in VS, will need to search

for it when at work.

 

If you have a JDK installed you can use its native2ascii tool that can

be used to replace non-ASCII characters with Unicoce escape sequences

that you can then use in C# as well (see Nicolas' post).

 

If you have Ant installed (sorry, can't resist ;-) you can convert the

whole tree in one (untested) go with something like

 

<copy todir="will-hold-translated-files"

      encoding="utf8">

  <fileset dir="holds-original-files"/>

  <escapeunicode/>

</copy>

 

Stefan

Reply via email to