On Thu, Jan 10, 2002 at 07:38:20PM +0900, Tomohiro KUBOTA wrote:
> > One temporary solution I could suggest is having specs (in this case,
> > Ogg tags) choose a specific vendor's translation tables for these, and
> > saying "until Unicode standardizes these tables, use these, not your
> > system's."  That would at least (try to) guarantee that until that
> > happens, if a user enters text on one system in SJIS, and moves it to
> > another via UTF-8, he'll get the same SJIS output.
> 
> I think it is a good idea.  I'd like you to request Unicode Consortium
> to follow your idea.  However, the problem is, Unicode Consortium doesn't
> have enough political power to define one standardized table and it
> doesn't have will to release one authorized mapping table.

I'm talking about an individual file format spec, that uses UTF-8 and
wants to be portable.  If someone enters text on a Windows computer into
the file and then reads it on a Sun computer, we want them to see the
same thing.  Since Unicode won't yet do this, I'm suggesting that the
file spec pick one conversion table and use it, hardcoding the
conversion.  If and when the conversion is standardized, this will be
deprecated and the hardcoded conversions removed for systems that don't
need it.

This means choosing conversion tables (preferably one in the public
domain; I don't know if that matters for something like this) and
implementing the converter.  (Preferably, not having to include a full
Han conversion table would be nice, but that would probably complicate
and slow down the converter, having to call the system iconv() for
individual characters; it might be negligible, though.)

If this is done, I don't know which tables should be used, especially
for other languages.

Yen vs \ is the best example of "unification" I've seen yet (even
though it's not *quite* the same thing.)  My terminal uses MS Gothic (a
Japanese font), and all backslashes show up as yen symbols for me.  It's
extremely ugly, and I'd much prefer to only see yen symbols where
they're intended (even if it's in a filename or something equally
"dumb".)  It's reasonable to expect something simple (not just a
typesetter or word processor) to be able to distinguish them.

Unfortunately, language tags are no solution here.  They're fine for
things like Ogg tags, where it's a single unit of information.  They're
not useful at all for terminals, where you need to be stateless.  (It's
not reasonable to expect everything to automatically send a "JA"
language tag when sending Japanese text, and to know to reset it when
processes are suspended and resumed, and so on.)  If it was possible to
attach a variant selector for it, it'd be completely usable--if your
locale is "en", and you want to output a yen symbol, output \ selecting
that "variant".  (Japanese people might want the opposite and it
wouldn't hurt to tag all \ characters.)  It still wouldn't be possible
to put a yen symbol in a DOS filename, though.

(Unfortunately, Microsoft Japanese fonts don't *have* a single-width
backslash *at all*, which means terminal emulators--which typically
don't want to deal with multiple fonts--are hard pressed to do anything
like this at all.  Grrr.)

-- 
Glenn Maynard
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to