On Fri, 29 Mar 2002, Anton Tagunov wrote: Hi Anton,
> Writing a bit of an article, putting in there all I have learnt > about CJK encodings on the Internet and at [EMAIL PROTECTED] > Has already taken me a week :-) I strongly recommend you get CJKV Information Processing by Ken Lunde. It has a lot of gory details. As Dan wrote in his dropped document on encodings, ISO-2022 standard (ECMA 35 at http://www.ecma.ch) itself is a great(?!) read :-) > Is GB 18030 the best spelling for this encoding? > Isn't it GB18030 or GB_18030 or GB_18030-2000? I guess the official designation is GB 18030-2000. > Is CNS 11643-1992 the best spelling? > Isn't it CNS11643, CNS11643-1992, CNS_11643-1992? I believe they're CNS 11643-1992 or CNS 11643-1986. > My suspicions arise from IANA registration names > without spaces like You have to be careful with IANA registration. In a sense, it's like a sink that accepts everything thrown into it :-) > JIS_C6226-1983 > KS_C_5601-198 > GB2312 > KSC5636 KSC5636 is for ISO 646-KR (Korean version of ISO 646 or US-ASCII), The official name is KS C 5636-1993 (KS X 1001:1992). The official name for KS_C_5601-1987 is KS C 5601-1987, which was revised in 1989, 1992 and reissued in 1997 as KS X 1001:1997, which was in turn revised in 1998 (with two characters added, one of which was EURO sign). The official designation of JIS_C6226-1983 should be JIS C 6226-1983, (a revision of JIS C 6226-1978) which was renamed JIS X 0208:1983 and then was revised and 'renamed' JIS X 0208:1997. You may noticed that JIS underwent changes in the designation of their character set standards (from JIS C -> JIS X) in early 1980's, which KS closely followed in 1997 (KS C -> KS X). Basically, JIS C and KS C (perhaps for electrical/electronics related standards) 'ran out of space' (well, they can use more digits.....) and both JIS and KS created a new section 'X' for IT-related standards. Moreover, the year a standard is issued used to be preceded by '-', but now is preceded by ':' as in ISO standards (e.g. ISO 10646-1:2000, ISO 10646-2:2001) > but did IANA invite these cryptic names above on its own? > Or did it take the names from existing standards? I guess they replaced space with '_' or got rid of space to make them a single word/identifier. > If from existing, then a similar name for the encodings in question > already probably exist.. Does it? As I wrote, IANA didn't consider much when something is given to them. They just add to the list almost whatever is given to them. If they had not done that, we could have had EUC-CN in place of 'GB2312'. Maybe, that's my biased impression/opinion, but some others expressed more or less similar views in other forums. You should not give too much weight to IANA registry. Jungshik