[cp-patches] RFC: patch for Unicode scalar value to UTF-16 conversion

Chris Burdess Sat, 07 Jan 2006 13:24:38 -0800

There is currently a problem with Character.toChars whereby the high char
of the UTF-16 surrogate pair is incorrectly generated. This patch fixes the
problem, ensuring that the output is correct, but it uses the algorithm
proposed by the Unicode spec to generate the surrogate pair and may
therefore not be optimally efficient.


I haven't committed this, I'd just like to solicit some feedback about it.
Please comment.
-- 
Chris Burdess
  "They that can give up essential liberty to obtain a little safety
  deserve neither liberty nor safety." - Benjamin Franklin

Index: java/lang/Character.java
===================================================================
RCS file: /cvsroot/classpath/classpath/java/lang/Character.java,v
retrieving revision 1.40
diff -u -r1.40 Character.java
--- java/lang/Character.java    17 Sep 2005 21:58:41 -0000      1.40
+++ java/lang/Character.java    7 Jan 2006 21:01:36 -0000
@@ -2410,9 +2410,8 @@
       {
         // Write second char first to cause IndexOutOfBoundsException
         // immediately.
-        dst[dstIndex + 1] = (char) ((codePoint & 0x3ff)
-                                    + (int) MIN_LOW_SURROGATE );
-        dst[dstIndex] = (char) ((codePoint >> 10) + (int) MIN_HIGH_SURROGATE);
+        dst[dstIndex + 1] = (char) (((codePoint - 0x10000) % 0x400) + 0xdc00);
+        dst[dstIndex] = (char) (((codePoint - 0x10000) / 0x400) + 0xd800);
         result = 2;
     }
     else

pgpoNkdW6XqLz.pgp
Description: PGP signature

_______________________________________________
Classpath-patches mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/classpath-patches

[cp-patches] RFC: patch for Unicode scalar value to UTF-16 conversion

Reply via email to