Thanks. IANA maintains this charset list: http://www.iana.org/assignments/character-sets
But, it does not have any entry for revised legacy charset like "ks_c_5601-1992" nor "ksx1001:1992". Moreover, It does not have "utf8" charset entry, because "utf8" is just one of the encodings of the Universal Character Set, not an independent charset plus encoding like "ks_c_5601-1987". Everyone knows UCS(ISo10646) and Unicode (UTC) changes and expands over time. Does UCS (ISO10646) versioning strictly follow Unicode Versioning (UTC) ? Then, Why can't we see "utf8-3.1" or "utf8-3.2" for Unicode 3.1 and 3.2 recpectively? Many applications performs CaseFold3.x(IDN) or NFKC3.x(CaseFold3.x(IDN)) or legacy2Unicode(IDN) upon input texts or parameters and tags their outputs with "encoding='utf8'". But this loose tagging without the precise version of applied Unicode standard, will cause interoperability problems between the sending and receiveing application using different versions of unicode standard. they will have different criteria and assumptions about being normalized or casefolded. Loose versioning tradition/practice on encodings of both of Unicode and Local char sets are so profound and prevalent that we can't cure this situation in the foreseeable future. I can't imagine all XML applications switch to "utf8-3.2" from "utf8". Unicode and Legacy charsets are not designed to be used in rigorous identifier contexts, instead primarily for textual applications or printer/display industries. that explains the origin of loose versioning practice in UCS and local char sets. Some application may adhere to this proposed precise versioning convention and may reflect the changes on legacy and UCS mapping tables as frequently as possible. But, significant majority of other applications would be unwilling to or unable or too late to do that. This situation cause another interoperability problems among applications. Currently proposed IDN standard is at best an experimental one and not adequate for any mission critial use. Approximation and exception handling are inevitable in UCS/legacy handling, but it is not allowed in universal identifier system like DNS. Rather, directory/search approach would do better that job for internationalized *access* to domain names. Soobok Lee ----- Original Message ----- From: "Keld J�rn Simonsen" <[EMAIL PROTECTED]> To: "Soobok Lee" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Friday, May 31, 2002 2:29 AM Subject: Re: [idn] Re: Legacy charset conversion in draft-ietf-idn-idna-08.txt (in ksc5601-1987) > On Fri, May 31, 2002 at 12:56:05AM +0900, Soobok Lee wrote: > > By "additions", i mean the required new tag for new version of legacy encoding, >like "ks_c_5601-1992" > > which should have been used, but never have been, as far as i know. Is there any >central > > registry that maintain the correct tag values for vaiour versions of numorous >legacy encodings ?? > > If not, how to ensure stable and interoperable legacy-2-unicode conversion among >myriads of applications ? > > IANA has a registry of charsets, and many of them have mappings defined > for UCS. There is also an ISO register that has mappings between > legacy charsets and UCS, available at > http://www.dkuug.dk/cultreg/registrations/charmap > > Kind regards > Keld
