Re: Non-ASCII chars in Java comments

Santiago Gala 3 Sep 2004 06:30:37 -0000

J.Pietschmann wrote:

Adam R. B. Jack wrote:

Some projects with issues (some JDK 1.5, some not) are listed here:
    http://brutus.apache.org/gump/jdk15/project_todos.html

Neat!

I did a quick check of the BCEL issues, and they are exclusively
problems with non-ASCII characters. While the BCEL problems are
easily fixed (bullet characters, probably cut&pasted from a HTML
page, and a few german umlauts), we had a similar problem with
a FOP source file some times ago, which was not as easily resolved,
because it was an email address containing the characters causing
the troubles. Ultimately, the originator allowed to pull the
address and have his name respelled in a romanized form.

Related questions:
1. Javac allows Java source file encodings with a greater range
 of characters, in particular UTF-8. Unfortunately, there is
 no standardized auto-detection mechanism (as for XML).
 Does anybody wants to discuss how projects/the whole ASF should
 deal with non-ASCII encodings for Java files?

Typically, IMO, the only way to deal with it involves adopting the convention that all files in a project are UTF-8 (which can hold any character). The Java books I read recommend using the \uXXXX convention for high characters in source code, so that no character in a java source is non-ASCII. I think that this convention should work in javadocs, but never tested it.

I've just found a similar bug with OpenOffice.org java files, which refused to compile in my es_ES.utf8 machine unless I prefixed the build with LC_ALL=C or a similar non-utf encoding.

2. How should situations be handled where characters which can't
 be encoded are important, like in email addresses or IRLs
 (internationalized URLs)?

IMO, adopting the convention that each project tarball uses a given encoding (UTF-8 ideally, since it minimizes breakage), and (for linux) using LC_ALL=en_US.utf8 before building (this was the issue I found, that some files in OpenOffice come encoded in iso-8859-1 but with no meta-information saying so). For window I have no idea if the encoding can be changed for a session or something.

How do Perl developers with this issues?

Regards
J.Pietschmann

This issue is language independent, it is a problem that will exist until a common encoding is used or meta information for all files is available.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Non-ASCII chars in Java comments

Reply via email to