Yah, Jungshik's words are reasonable.  So the success of the
Unicode-capable "modern software" possibly depends more on the fact that
they started early (in the UCS-2 period) and is time tested.

But this is only one possibility.  For Asians, UTF-16 is really more
"economic" than UTF-8.  UTF-8 in Perl, GNOME, and other Open Source software
seems connected with the facts that they are mainly developed and used by
Western developers/users and they have a root in Unix.  Both facts regard
UTF-8 in favour. -- I am by no means objecting to UTF-8: I am just trying
to digging out the cause. -- Most Chinese programmers are still struggling
for survival and few have time to contribute to Open Source projects.  Unix
is much less familiar to us than Windows.  Do not misunderstand me; I love
Linux and like UTF-8.  My locale setting in Red Hat is not zh_CN.GB18030 but
UTF-8.  But Asians (in developing countries especially) do possibly view
things differently from Western programmers.  And Markus Scherer's arguments
over the virtue of UTF-16 is enough for *me*.  And (I don't know whether
this point will cause flame or not), commercial developers _may_ consider
more the requirements of "end users" than open source developers do.

By the way, for pure Chinese text the size ratio between UTF-16 and UTF-8 is
nearly always 3:2.  I am Chinese and I can assure you that only ancient
literature editors will care about the CJK extensions above U+FFFF.  You
won't think a 50% growth in file size and/or memory occupation is always
acceptable.  Small size _is_ a virtue.  I only regret why the mathematical
symbols should be put outside BMP.  Sigh!

Of course, only personal opinions, which might be wrong.  And, as a rule,
there is not a single "correct" way to do things.  We always need
compromises.

Best regards,

Wu Yongwei

--- Original Message from Jungshik Shin ---
> > Is it true that "Almost all modern software that supports Unicode,
> > especially software that supports it well, does so using 16-bit Unicode
> > internally: Windows and all Microsoft applications (Office etc.), Java,
> > MacOS X and its applications, ECMAScript/JavaScript/JScript, Python,
> > Rosette, ICU, C#, XML DOM, KDE/Qt, Opera, Mozilla/NetScape,
> > OpenOffice/StarOffice, ... "?
>
> Do they support characters above U+FFFF as fully as others?

   Yes. . At least, I know for sure Mozilla and MS IE, MS Office XP
do.  That does not make me a fan of UTF-16.  You shouldn't assume
that others don't do what you're not happy to deal with.

The reason they use UTF-16 is NOT because it's inherently better
than other UTF's(UTF-8, UTF-32) BUT because they (not all) began
with UCS-2 and have a lot of baggages (written in UCS-2) to carry
on.  The prime example of this Win32 W API's. The same is true of
Java, ECMAScript (the transition is not yet complete in case of
ECMAScript), and Mozilla.  (see
http://bugzilla.mozilla.org/show_bug.cgi?id=183156, for instance)


In case of applications written with UTF-8 as the internal string
representation (asked for in another posting), there are lots of
them. Basically, most gnome/gtk applications do because glib and
pango are based on UTF-8. Moreover, there's a programming language
whose internal char. representation is UTF-8 as is well known. It's
Perl. Besides, judging from the fact that Sun's iconv(3) implementation
uses UTF-8 as a hub (instead of UTF-32 as is the case of glibc's
iconv(3)), many programs in Solaris must be heavy users of UTF-8.


  Jungshik

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to