J.Pietschmann wrote:
Adam R. B. Jack wrote:
Some projects with issues (some JDK 1.5, some not) are listed here: http://brutus.apache.org/gump/jdk15/project_todos.html
Neat!
I did a quick check of the BCEL issues, and they are exclusively problems with non-ASCII characters. While the BCEL problems are easily fixed (bullet characters, probably cut&pasted from a HTML page, and a few german umlauts), we had a similar problem with a FOP source file some times ago, which was not as easily resolved, because it was an email address containing the characters causing the troubles. Ultimately, the originator allowed to pull the address and have his name respelled in a romanized form.
Related questions: 1. Javac allows Java source file encodings with a greater range of characters, in particular UTF-8. Unfortunately, there is no standardized auto-detection mechanism (as for XML). Does anybody wants to discuss how projects/the whole ASF should deal with non-ASCII encodings for Java files?
Typically, IMO, the only way to deal with it involves adopting the convention that all files in a project are UTF-8 (which can hold any character). The Java books I read recommend using the \uXXXX convention for high characters in source code, so that no character in a java source is non-ASCII. I think that this convention should work in javadocs, but never tested it.
I've just found a similar bug with OpenOffice.org java files, which refused to compile in my es_ES.utf8 machine unless I prefixed the build with LC_ALL=C or a similar non-utf encoding.
IMO, adopting the convention that each project tarball uses a given encoding (UTF-8 ideally, since it minimizes breakage), and (for linux) using LC_ALL=en_US.utf8 before building (this was the issue I found, that some files in OpenOffice come encoded in iso-8859-1 but with no meta-information saying so). For window I have no idea if the encoding can be changed for a session or something.2. How should situations be handled where characters which can't be encoded are important, like in email addresses or IRLs (internationalized URLs)?
This issue is language independent, it is a problem that will exist until a common encoding is used or meta information for all files is available.How do Perl developers with this issues?
Regards J.Pietschmann
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]