On Thu, 21 Mar 2002, Masataka Ohta wrote: >Unicode is not usable in international context.
Come, now. My understanding has been that that's precisely the context where Unicode is supposed to be used. Why have Unicode in the first place if not for multilingual text? >There is no unicode implementaion work in international context. How's that? I would tend to see current browser and office suite work as such, at the very least. >Unicode is usable in some local context. If I understand correctly, you're opposed to Unification and the external protocols for language tagging it necessitates if we insist on absolute typographical correctness of text with unified ideographs. HTML and XML/XHTML provide precisely such tagging, as do Unicode language tag characters in the context of protocols with no external language indication facilities. Also I seem to remember that East Asian Unicode text is legible even when printed in a font not designed for the particular "local context", as you put it. Where's the problem? >There is some unicode implementaion work in local contexts. Well, considering that UTF-8 is the encoding of choice for some past, much present and all future IETF and W3C work, and that Microsoft's products seem to be heading for UTF-16, I'd say that is a colossal understatement. >However, the context information must be supplied out of band. Not must, but rather must provided that the text is meant for human consumption *and* exact typography of variant characters is a requirement. Unicode was never meant to solve the latter part -- it does not encode font information, for instance. >And, the out of band information is equivalent to "charset" information, >regardless of whether you call it "charset" or not. Absolutely not. Unicode's characters are perfectly well defined even if they are not correctly printed. What we're indicating here are differences between languages and preferred renderings of a given piece of Unicode text. You should think about this in the context of rendering to speech, perhaps that will help you see the fine distinctions involved. After all, one cannot even *attempt* such renderings from text written in pure Latin-1 without external language indication. Why should graphic rendering be any different? >Fix is to supply context information out of band to specify which >Unicode-based local character set to use. No. The fix is to indicate the language the document is in, or perhaps encode font information explicitly. Provided such meticulous attention to the appearance of the text is warranted in a given application, anyway. >See, for example, RFC1815. Yes, you seem to have objected to Unicode, before. The trouble is, not a whole lot of people agree with RFC1815. For a reason, I daresay. >As for IDN, it can't just say "use charset of utf-7" or "use charset of >utf-8". Of course you can. UTF-8 and UTF-7 are bona fide character sets -- the characters they define are unique and well-defined. You're confusing renderings of characters with characters themselves, a common mistake with Unicode. (But of course you already know that.) >Anyway, with the fix, there is no reason to prefer Unicode-based local >character sets, which is not widely used today, than existing local >character sets already used world wide. Of course there is -- my local character set cannot represent Arabic, Japanese, English and Finnish, with correct punctuation and other typographic pedantries, in the same document. Hell, it cannot do that even in separate documents, unless I use Unicode. The latter on the other hand works perfectly, with no loss of information. Your line of thinking is what has lead to e.g. i-mode employing JIS or Latin-1, and making the current incarnation of that technology useless to Central European, Chinese, African, Indian, and a probably a whole lot of other, user communities. Just think about what something like this would do in an IDNA context, and you'll understand why Unicode is a Good Idea. Besides, if you look at a Chinese user typing in the name of a Japanese site, I would say unification makes the procedure considerably more forgiving. Don't you? I'd say local variants will full support are Bad, a unified coding with local profiles is Good. Sampo Syreeni, aka decoy - mailto:[EMAIL PROTECTED], tel:+358-50-5756111 student/math+cs/helsinki university, http://www.iki.fi/~decoy/front openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
