On Thu, Jan 10, 2002 at 07:38:20PM +0900, Tomohiro KUBOTA wrote: > > One temporary solution I could suggest is having specs (in this case, > > Ogg tags) choose a specific vendor's translation tables for these, and > > saying "until Unicode standardizes these tables, use these, not your > > system's." That would at least (try to) guarantee that until that > > happens, if a user enters text on one system in SJIS, and moves it to > > another via UTF-8, he'll get the same SJIS output. > > I think it is a good idea. I'd like you to request Unicode Consortium > to follow your idea. However, the problem is, Unicode Consortium doesn't > have enough political power to define one standardized table and it > doesn't have will to release one authorized mapping table.
I'm talking about an individual file format spec, that uses UTF-8 and wants to be portable. If someone enters text on a Windows computer into the file and then reads it on a Sun computer, we want them to see the same thing. Since Unicode won't yet do this, I'm suggesting that the file spec pick one conversion table and use it, hardcoding the conversion. If and when the conversion is standardized, this will be deprecated and the hardcoded conversions removed for systems that don't need it. This means choosing conversion tables (preferably one in the public domain; I don't know if that matters for something like this) and implementing the converter. (Preferably, not having to include a full Han conversion table would be nice, but that would probably complicate and slow down the converter, having to call the system iconv() for individual characters; it might be negligible, though.) If this is done, I don't know which tables should be used, especially for other languages. Yen vs \ is the best example of "unification" I've seen yet (even though it's not *quite* the same thing.) My terminal uses MS Gothic (a Japanese font), and all backslashes show up as yen symbols for me. It's extremely ugly, and I'd much prefer to only see yen symbols where they're intended (even if it's in a filename or something equally "dumb".) It's reasonable to expect something simple (not just a typesetter or word processor) to be able to distinguish them. Unfortunately, language tags are no solution here. They're fine for things like Ogg tags, where it's a single unit of information. They're not useful at all for terminals, where you need to be stateless. (It's not reasonable to expect everything to automatically send a "JA" language tag when sending Japanese text, and to know to reset it when processes are suspended and resumed, and so on.) If it was possible to attach a variant selector for it, it'd be completely usable--if your locale is "en", and you want to output a yen symbol, output \ selecting that "variant". (Japanese people might want the opposite and it wouldn't hurt to tag all \ characters.) It still wouldn't be possible to put a yen symbol in a DOS filename, though. (Unfortunately, Microsoft Japanese fonts don't *have* a single-width backslash *at all*, which means terminal emulators--which typically don't want to deal with multiple fonts--are hard pressed to do anything like this at all. Grrr.) -- Glenn Maynard -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/