Re: Encoding conversions

Ienup Sung Tue, 11 Sep 2001 12:28:12 -0700
Hi,

I just would like to correct Mr. Drepper's email that is misleading.

POSIX/Single Unix Specification defines wchar_t as an opaque data type.
The only thing that people can rely on regarding the values of the wchar_t
is that there are characters called Portable Character Set (PCS) exist in
any codeset and certainly in it's wide character values. The PCS is defined
in X/Open CAE Specification: System Interface Definitions, a.k.a., XBD, and
include most of printable ASCII characters and some control characters.
(Unicode itself as wide character is a good idea, but, at the same
time, it's not really covering all the possible characters, for instance,
like of CNS 11643 with 16 planes even though one could argue most of defined
characters are already in Unicode 3.1.)

X11 or any other system interfaces (except some interfaces very specific for
EUC codesets like euclen()) in Stds and also in Solaris and pretty much any
other commercial Unix systems don't expect wchar_t to be EUC wide character
representation because all the interfaces and utilities are so-called
codeset independence (CSI). And that's the reason why we can support Unicode
and pretty much any other codesets with our systems. Also, as an example,
wchar_t in Unicode/UTF-8 locales are in UTF-32 in most systems.

With regards,

Ienup


] Date: Mon, 10 Sep 2001 20:38:28 -0700
] From: Ulrich Drepper <[EMAIL PROTECTED]>
] Subject: Re: Encoding conversions
] To: [EMAIL PROTECTED]
] MIME-version: 1.0
] X-fingerprint: BE 3B 21 04 BC 77 AC F0  61 92 E4 CB AC DD B9 5A
] X-fingerprint: e6:49:07:36:9a:0d:b7:ba:b5:e9:06:f3:e7:e7:08:4a
] 
] Jungshik Shin <[EMAIL PROTECTED]> writes:
] 
] >  At least under Solaris, it's locale-dependent. I don't know
] > the details, but it seems like somehow Sun engineers found that it's
] > better that way (in terms of efficiency or some other metrics....).
] 
] This has historic reasons.  They are stuck with the locale dependent
] wchar_t because the Xlib i18n model they are using for the ja/ko/zh
] depends on it.  X expects wchar_t values in locales using the EUC
] charsets to use the bytes of the multibyte EUC values.
] 
] I'm pretty sure they want to get away from this asap.
] 
] -- 
] ---------------.                          ,-.   1325 Chesapeake Terrace
] Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
] Red Hat          `--' drepper at redhat.com   `------------------------
] -
] Linux-UTF8:   i18n of Linux on all levels
] Archive:      http://mail.nl.linux.org/linux-utf8/
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/
Re: Encoding conversions

Reply via email to