Re: mk_wcwidth

Jungshik Shin Fri, 21 Jun 2002 01:41:35 -0700


On Thu, 20 Jun 2002 [EMAIL PROTECTED] wrote:

> >You do realize that people in CJK locales expect some characters to be
> >double width that people in European/American locales expect to be single
> >width.
>
> Doublewidth roman letters are in the unicode range FF00-FFFE, so
> when converting from a legacy encoding that assumes the ascii
> ranges are all doublewidth, you map to (ascii+FEE0). With

  Well, legacy _encodings_ like EUC-JP/KR, Shift_JIS,
Big5 and  GB2312(should be EUC-CN)   include _two_ distinct sets of Latin
letters, one set in US-ASCII(or its national counterpart) and the other
set in JIS X 0208 (EUC-JP), KS X 1001(EUC-KR), JIS  X 0208(Shift_JIS),
Big5(Big5), GB2312-80(EUC-CN). It's _only the latter_ that has to be
mapped to full width US-ASCII characters in Unicode. Most CJK input
methods , whether in Unix/X11, MS-WIndows or MacOS, offer a distinctive
way to input full width US-ASCII characters.

> unicode you can even mix double and singlewidth "ascii" in a
> single document; many of the roman letters became "kanji"
> when in doublewidth form (for example doublewidth capital
> letter H can mean pornography) and have a different meaning
> than their single-width brethren.
>
> So a unicode char-cell width function should function identically
> for all locales.

  Not true. Although I'm not among those who like to see Greek and
Cyrillic letters rendered in full-wdith (it's really ugly !!), there ARE
_some_ (I wouldn't say there are many)  CJK people who want to keep them
that way.  Moreover, it's not only Greek and Cyrillic letters but also
line drawings that have locale-dependent width. You may as well read UTR
#11/UAX #11 East Asian Width at <http://www.unicode.org/reports/tr11/>.


> (I dont know of any unicode support for fullwidth greek or cyrillic,
>  but should such a thing be needed, there is room north of the BMP)

  There will be never such thing in Unicode. Only reason the full width
Latin letters are encoded separately in Unicode was that they had been
present in legacy CJK characters with distinct code points from US-ASCII
(half-width) counterparts. See above.

  Jungshik Shin

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/
Re: mk_wcwidth

Reply via email to