On Tue, 28 Aug 2012 10:39:03 -0500, Kirk Wolf wrote: > >UTF-16 is used in Java (and other languages) as the internal representation >of characters and strings (each character represented by two bytes). > No. Not according to:
http://en.wikipedia.org/wiki/UTF-16 UTF-16 (16-bit Unicode Transformation Format) is a character encoding for Unicode capable of encoding 1,112,064[1] numbers (called code points) in the Unicode code space from 0 to 0x10FFFF. It produces a variable-length result of either one or two 16-bit code units per code point. And: http://www.ietf.org/rfc/rfc2781.txt The rules for how characters are encoded in UTF-16 are: - Characters with values less than 0x10000 are represented as a single 16-bit integer with a value equal to that of the character number. - Characters with values between 0x10000 and 0x10FFFF are represented by a 16-bit integer with a value between 0xD800 and 0xDBFF (within the so-called high-half zone or high surrogate area) followed by a 16-bit integer with a value between 0xDC00 and 0xDFFF (within the so-called low-half zone or low surrogate area). - Characters with values greater than 0x10FFFF cannot be encoded in UTF-16. -- gil ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN