Yah, Jungshik's words are reasonable. So the success of the Unicode-capable "modern software" possibly depends more on the fact that they started early (in the UCS-2 period) and is time tested.
But this is only one possibility. For Asians, UTF-16 is really more "economic" than UTF-8. UTF-8 in Perl, GNOME, and other Open Source software seems connected with the facts that they are mainly developed and used by Western developers/users and they have a root in Unix. Both facts regard UTF-8 in favour. -- I am by no means objecting to UTF-8: I am just trying to digging out the cause. -- Most Chinese programmers are still struggling for survival and few have time to contribute to Open Source projects. Unix is much less familiar to us than Windows. Do not misunderstand me; I love Linux and like UTF-8. My locale setting in Red Hat is not zh_CN.GB18030 but UTF-8. But Asians (in developing countries especially) do possibly view things differently from Western programmers. And Markus Scherer's arguments over the virtue of UTF-16 is enough for *me*. And (I don't know whether this point will cause flame or not), commercial developers _may_ consider more the requirements of "end users" than open source developers do. By the way, for pure Chinese text the size ratio between UTF-16 and UTF-8 is nearly always 3:2. I am Chinese and I can assure you that only ancient literature editors will care about the CJK extensions above U+FFFF. You won't think a 50% growth in file size and/or memory occupation is always acceptable. Small size _is_ a virtue. I only regret why the mathematical symbols should be put outside BMP. Sigh! Of course, only personal opinions, which might be wrong. And, as a rule, there is not a single "correct" way to do things. We always need compromises. Best regards, Wu Yongwei --- Original Message from Jungshik Shin --- > > Is it true that "Almost all modern software that supports Unicode, > > especially software that supports it well, does so using 16-bit Unicode > > internally: Windows and all Microsoft applications (Office etc.), Java, > > MacOS X and its applications, ECMAScript/JavaScript/JScript, Python, > > Rosette, ICU, C#, XML DOM, KDE/Qt, Opera, Mozilla/NetScape, > > OpenOffice/StarOffice, ... "? > > Do they support characters above U+FFFF as fully as others? Yes. . At least, I know for sure Mozilla and MS IE, MS Office XP do. That does not make me a fan of UTF-16. You shouldn't assume that others don't do what you're not happy to deal with. The reason they use UTF-16 is NOT because it's inherently better than other UTF's(UTF-8, UTF-32) BUT because they (not all) began with UCS-2 and have a lot of baggages (written in UCS-2) to carry on. The prime example of this Win32 W API's. The same is true of Java, ECMAScript (the transition is not yet complete in case of ECMAScript), and Mozilla. (see http://bugzilla.mozilla.org/show_bug.cgi?id=183156, for instance) In case of applications written with UTF-8 as the internal string representation (asked for in another posting), there are lots of them. Basically, most gnome/gtk applications do because glib and pango are based on UTF-8. Moreover, there's a programming language whose internal char. representation is UTF-8 as is well known. It's Perl. Besides, judging from the fact that Sun's iconv(3) implementation uses UTF-8 as a hub (instead of UTF-32 as is the case of glibc's iconv(3)), many programs in Solaris must be heavy users of UTF-8. Jungshik -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/