On Wed, Jul 17, 2002 at 04:17:15PM +0100, Nicholas Clark wrote: > My understanding was that Unicode has now escaped the base plane (or whatever > it's called) and now has started using code points >65536. How does Java > cope with this? This is getting a little off-topic, I think. But here's a brief overview of the Unicode codespace size issue - if you have any more questions, you can ask me off-list.
There were originally two separate universal character set efforts, by the ISO and the Unicode Consortium. They decided early on to combine their efforts and be mutually compatible. However, ISO-10646 was designed as a 32-bit code, consisting of 65,536 16-bit "planes", while Unicode was only 16 bits. So Unicode is identical to plane 0 of ISO-10646, called the Basic Multilingual Plane (BMP). So far, the ISO has no characters defined outside of this plane. It does plan to define some eventually, however (in ISO-10646-2), and this is handled in Unicode through a section of the code space called "surrogates", which are used in the UTF-16 encoding to reach planes 1-16 of ISO-10646. ISO has no plans to define characters outside of planes 1-16 anytime in the foreseeable future (or, indeed, outside of planes 1-14, since 15 and 16 are reserved for private use). -- Mark REED | CNN Internet Technology 1 CNN Center Rm SW0831G | [EMAIL PROTECTED] Atlanta, GA 30348 USA | +1 404 827 4754 -- The end of the world will occur at three p.m., this Friday, with symposium to follow.