> And I have found that most of Chinese (Continental; seems like > Taiwanese are much more technically correct) and Korean mails and web > pages confuse "charset" and "encodings". That is, charset="gb2312"
IMHO, you're also misusing the term 'charset' here. MIME charset can be used synonymously with 'encodings' (or character set encoding scheme: see CJKV Information Processing, IETF RFC 2130 and RFC 2278). What has to be distinguished is 'coded character set' on the one hand (JIS X 0208, JIS X 0212, KS X 1001, KS X 1003, GB 2312, CNS 11xxx, ISO 10646, ISO 646, US-ASCII, ISO-8859-x) and 'encoding/character set encoding scheme/MIME charset on the other hand (EUC-JP, EUC-KR, EUC-TW, EUC-CN, ISO-2022-JP, ISO-2022-KR, ISO-2022-CN, ISO-8859-x, UTF-8, UTF-32, UTF-7, UTF-16, Big5, UHC) All right in certain context, 'charset' may have been used to mean 'coded character set', but it'd better be avoided when you want to compare it to encoding because 'charset' (in MIME context) also means 'encoding' instead of 'coded character set'. > really means euc-cn and charset="ks_c_5601-1987" really menas euc-kr. > Sadly this misconception is enbedded to popular browsers. Well, use of 'ks_c_5601-1987' is the result of an 'evil' act of Microsoft. We furiously objected it, but M$ went on to use that name in their products instead of then-well-establisehd EUC-KR around 1997. Please, refer to Ken Lunde's CJKV Information Processing about that 'epic war' between two camps. (see p.197 of the book and http://jshin.net/faq/qa8.html) We even set up a web page to prevent M$ from spreading that ill-defined name. Anyway, their designation couldn't withstand the test of the time because KS C 5601-1987 was renamed KS X 1001:1998. Still, M$ IE and M$ OE, M$ Frontpage keep producing html docs. However, it also has to be noted that the encoding designated as 'ks_c_5601-1987' by M$ is NOT the same as EUC-KR BUT their proprieatary extension of EUC-KR, namely CP949/UHC/(X-)-Windows-949. > Sadly this misconception is enbedded to popular browsers. MS IE certainly counts as a popular browser, but Mozilla/Netscape never used 'ks_c_5601-1987' to mean EUC-KR. They always have used 'EUC-KR'. Mozilla uses 'X-Windows-949' to mean CP949/UHC and 'ks_c_5601-1987' is an alias to 'X-Windows-949' (but Mozilla will never have 'ks_c_5601-1987' in outgoing messages/docs. It only accept html/emails labeled that way as in X-Windows-949). In case of 'GB2312' in place of 'EUC-CN', the situation was beyond repair (Ken Lunde's book was too late and an error-prone book by a Japanese engineer working at MS published a few years earlier spread the misconception too widely) so that the name just stuck. As for Taiwan, the reason there's no confusion between coded character set and encoding is not because they're technically correct but because in their case EUC-TW has never been used widely while the popular encoding Big5 has much more complex relationship with CNS 11xxx than EUC-KR with KS X 1001 and EUC-CN with GB 2312. (Big5 vs CNS 11xxx is similar to Shift_JIS vs JIS X 0208) Jungshik Shin