On Thu, 4 Apr 2002, Anton Tagunov wrote: Hi Anton,
Thanks a lot. > - changes status of KOI8-U on Jungshik's comment > (sorry, I have never tested that myself :-( I haven't test it either :-), but both Mozilla/Netscape6 and MS IE list it in view|encoding menu, which I interpret as having support for it. > UTF-16 > - KOI8-U (http://www.faqs.org/rfcs/rfc2319.html) > > -are IANA-registered (C<UTF-16> even as a preferred MIME name) > +=for comment > +waiting for comments from Jungshik Shin to soften this - Anton > + > +is a IANA-registered preferred MIME name > but probably should be avoided as encoding for web pages due to > -the lack of browser supports. > +the lack of browser support. The reason your test didn't work with MS IE was probably you didn't prepend your UTF-16 html doc. with BOM(byte order mark). It's to be noted that a conventional way of informing web browsers of MIME charset by putting <meta> tag doesn't work for UTF-16/UTF-32. Either you have to configure your web server to emit C-T header with 'charset=UTF-16(LE|BE)' or you have to put BOM at the beginning. When BOM is present, MS IE 5/6, Mozilla/Netscape6 and Netscape4 have no problem rendering UTF-16(LE|BE) encoded pages. I put up a couple of test pages at http://jshin.net/i18n/utf16le_kr2.html http://jshin.net/i18n/utf16be_kr2.html For more details on UTF-16 and HTML, you can refer to HTML4 spec. at http://www.w3.org/TR/html4/charset (see section 5.2.1) As I wrote before, I have no intention to encourage use of UTF-16 over UTF-8 although some people whose primary script has a more 'economical' (in terms of file size) representation in UTF-16 than in UTF-8 may want to use it. > +=head2 Microsoft-related naming mess > + > +Microsoft products misuse the following names: > + > +=over 2 > + > +=item KS_C_5601-1987 > + > +Microsoft extension to C<EUC-KR>. > + > +Proper name: C<CP949>. > + > +See > +http://lists.w3.org/Archives/Public/ietf-charsets/2001AprJun/0033.html > +for details. Wow, I didn't know that Martin wrote this. Thanks a lot for digging this up. He 'rediscovered' what a lot of people in Korea had complained about. One thing I don't agree with him is what designation to use for CP949. I think it'd better be 'windows-949' because that's more in line with other MS code pages such as windows-125x (for European scripts). By the same token, MS version of Shift_JIS can be labeled as 'windows-932. At the moment, Mozilla uses 'x-windows-949' for CP949/UHC because it's not yet registered with IANA. Probably, I have to contact Martin and discuss this issue. > +Encode aliases C<KS_C_5601-1987> to C<cp949> to reflect > +this common misusage. If my patch is accepted, cp949 has a couple of more aliases, 'uhc' and '(x-)-windows-949'. CP949 is commonly known as 'ÅëÇÕ ¿Ï¼ºÇü'(Unified Hangul Code) in Korea. > +I<Raw> C<KS_C_5601-1987> encoding is available as C<kcs5601-raw>. ksc5601-raw had better be renamed ksx1001-raw and ksc5601-raw can be made an alias to ksx1001-raw. Pls, note that now what's now called ksc5601-raw has two new characters which were only added in Dec. 1998 over a year after the name change (KS C 5601 -> KS X 1001). > +=item GB2312 > + > +Encode aliases C<GB2312> to C<euc-cn> in full agreement with > +IANA registration. C<cp936> is supported separately. > +I<Raw> C<GB_2312-80> encoding is available as C<kcs5601-raw>. Oops... You meant gb2312-raw, didn't you? :-) > Jungshik, I would have certainly advocated linking not only to > http://lists.w3.org/Archives/Public/ietf-charsets/2001AprJun/0033.html > but also to your comments on the KS_C_5601-1987 in the list archive, > but all your mails were on several subjects each. > > Jungshik> ... refer to Ken Lunde's CJKV Information Processing > Jungshik> about that 'epic war' between two camps. (see p.197 of > Jungshik> the book and http://jshin.net/faq/qa8.html) > Jungshik> We even set up a web page to prevent M$ from spreading that > Jungshik> ill-defined name. > > maybe we may link to this page? What is the address? The campaign web has disappeared since. It was almost 5 years ago :-). However, my Hangul FAQ subject 8 deals with the issue (http://jshin.net/faq/qa8.html) so that you may add the link to it. Well, be aware that it's been untouched for a few years (if not longer) and needs a complete overhaul.