Re: Fwd^2: Re: rendering unicode han

Graydon Hoare 13 Sep 1999 15:53:06 -0000

yes, I understand that the rendering needs to be localized, but I
believe that is not at all out of keeping with the unicode design
philosophy. for instance if I were to have a list of words taken from
spanish, french and english, I would not be able to sort any of them
in language-specific order because each language induces its own
collation order. it seems to me that your friend is still talking
about glyphic variants, and that glyphic variants are not grounds for
breaking a character into 2.


Unicode has been insistent from the beginning on _not_ encoding
information about specific languages. languages change far more
frequently than characters. What if next year the chinese variant of
U+516B becomes popular for use in vietnam? should the character set be
rewritten to accomodate? No, clearly the language has changed and not
the character. You need to indicate in your locale settings (including
choice of font) which variants you want to use.

To put it another way -- imagine your theoretical article with
chinese, japanese, korean and vietnamese being displayed in it. lets
say you're a japanese person reading it. You use a "japanese" locale,
which includes loading a font made by a japanese foundry, which
supports only the japanese glyph variants. there are 2 possibilities:

(1) you can, through some amazing miracle of language training, read
all 4 languages in their native orthography. You grew up with the
japanese variant of U+516B, so when you see it occur in the middle of
the vietnamese block of text you see it as just a "japanese-friendly
font rendition" of an obviously vietnamese character, make a
reasonable assumption that the author meant the vietnamese variant
which just doesn't happen to exist in your japan-made font, and carry on.

(2) you can't read the other 3 languages anyway. who cares what glyphs 
they use? you can read U+516B in the japanese portions of the text,
which is the only part you understand anyway.

I really don't think this is a CJKV-specific problem. the same thing
will happen to me if I find myself in germany writing email in english
using a german-localized email client, and I type 2 consecutive "s"
characters in, I'm not going to be terribly surprised when it forms a
ligature. One which my friends who have never read german will think,
at a glance, is a capital "B". It's a hazard of localization,
imo. unicode is not intended to create an environment in which
everyone can magically understand each other. merely one in which
pairs of people speaking the same language can understand each other
without having to use specialized versions of the software, and in
which automatic tools like grep and sed have some hope of being able
to work right without knowing which language they're scanning.

-graydon


--  
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Re: Fwd^2: Re: rendering unicode han

Reply via email to