> > Also, there would hardly have been any hurt feelings if U+0065
> > (Latin), U+0391 (Greek), and 0+0410 (Cyrillic) had been unified. 
> > It would just not have saved enough code points to bother.
> 
> Actually, it was impossible because of the source separation rule 
> applied to Chinese, Korean, and Japanese encodings, including Big 
> Five, GB2312, KSC, and JIS. (How ironic.) These standards include 
> various combinations of Latin, Greek, and Cyrillic alphabets in 
> separate code blocks alongside Hanzi, Zhuyin, Hangul, and kana. So 
> LATIN CAPITAL LETTER A, CYRILLIC CAPITAL LETTER A, and GREEK CAPITAL 
> LETTER ALPHA cannot be unified without breaking round-trip conversion 
> for these standards. 

Source separation rule also for the 8859 series of standards gives that
they had to be separately encoded.

But even so, they had to be separated: similar-looking uppercase forms
have different corresponding lowercase forms.  So as not to make case
mapping horribly difficult (it's hard enough as it is!), Latin, Greek,
and Cyrillic had to be non-unified.

                /kent k

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to