According to Unicode Design principles in Unicode 3.0 specification: <quot>Unicode characters have a width of 16 bits.</quot>
While in Unicode 4.0 standart there are no character width related principles. And according to JavaDocs of Character class (J2SE 1.4): <quot>Character information is based on the Unicode Standard, version 3.0. </quot> And one more from the Java Language Specification: <quot> Versions of the Java programming language prior to 1.1 used Unicode version 1.1.5 (see The Unicode Standard: Worldwide Character Encoding (1.4) and updates). Later versions prior to JDK version 1.1.7 used Unicode version 2.0. Since JDK version 1.1.7, Unicode 2.1 has been in use. The Java platform will track the Unicode specification as it evolves. The precise version of Unicode used by a given release is specified in the documentation of the class Character. </quot> So, it seems that the only thing (optimistic mode is on) that should be changed in further versions of Java to support Unicode 4.0 is to modify the Character class. Regards, Konstantin Piroumian ----- Original Message ----- From: "Stefano Mazzocchi" <[EMAIL PROTECTED]> To: "Apache Cocoon" <[EMAIL PROTECTED]> Sent: Thursday, November 13, 2003 21:06 Subject: [d'oh!] java APIs are not powerful enough to handle the XML spec!! The day somebody asks you why java needs to be replaced, one answer will be 'it only supports 16-bits chars'. laughable as it might seem, it's true. yes, people, a Unicode char is not 16 bit (as I always though!) but 32!! And even the XML specification says so. Char ::=══#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] do the math and you find that #x10000 cannot fit in 16 bits! now, if you thought you could take the character() SAX event and create a String out of it and do something useful with is (like print it, for example), forget it. The result will very likely not be the one you expect. Another reason not to use Stings at all. -- Stefano.
