[idn] utf8/legacy versioning

Soobok Lee Sun, 02 Jun 2002 18:02:08 -0700


Thanks.
IANA maintains this charset list: http://www.iana.org/assignments/character-sets

But, it does not have any entry for
 revised legacy charset like "ks_c_5601-1992" nor "ksx1001:1992".

Moreover, It does not have "utf8" charset entry, because "utf8" is just one of the 
encodings of the Universal
Character Set, not an independent charset plus encoding like "ks_c_5601-1987".
Everyone knows UCS(ISo10646) and Unicode (UTC) changes and expands over time.
Does  UCS (ISO10646) versioning  strictly follow  Unicode Versioning (UTC) ?
Then, Why can't we see "utf8-3.1" or "utf8-3.2" for Unicode 3.1 and 3.2 recpectively?

Many applications performs CaseFold3.x(IDN) or NFKC3.x(CaseFold3.x(IDN)) or 
legacy2Unicode(IDN)
upon input texts or parameters and  tags their outputs with "encoding='utf8'". But 
this loose tagging
without the precise version of applied Unicode standard, will cause  interoperability
problems between the sending and receiveing application using different versions of
unicode standard. they will have different criteria and assumptions about being 
normalized or casefolded.

Loose versioning tradition/practice on  encodings of both of Unicode and Local char 
sets are so profound
and prevalent that we can't cure this situation in the foreseeable future.
I can't imagine all XML applications switch to "utf8-3.2" from "utf8".
Unicode and Legacy charsets are not designed to be used in rigorous identifier 
contexts, instead
primarily for textual applications or printer/display industries. that explains the 
origin of loose
versioning practice in UCS and local char sets.

Some application may adhere to this proposed precise versioning convention and may 
reflect the changes on
legacy and UCS mapping tables as frequently as possible. But, significant majority of
other applications would be unwilling to or unable or too late to do that. This 
situation cause
another interoperability problems among applications. Currently proposed IDN standard 
is
at best an experimental one and not adequate for any mission critial use.

Approximation and exception handling are inevitable in UCS/legacy handling, but it is 
not
allowed in universal identifier system like DNS. Rather, directory/search approach
would do better that job for internationalized *access* to domain names.

Soobok Lee

----- Original Message -----
From: "Keld J�rn Simonsen" <[EMAIL PROTECTED]>
To: "Soobok Lee" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Friday, May 31, 2002 2:29 AM
Subject: Re: [idn] Re: Legacy charset conversion in draft-ietf-idn-idna-08.txt (in 
ksc5601-1987)

> On Fri, May 31, 2002 at 12:56:05AM +0900, Soobok Lee wrote:
> > By "additions", i mean the required new tag for new version of legacy encoding, 
>like "ks_c_5601-1992"
> > which should have been used, but never have been, as far as i know. Is there any 
>central
> > registry that maintain the correct tag values for vaiour versions of numorous 
>legacy encodings ??
> > If not, how to ensure stable and interoperable legacy-2-unicode conversion among 
>myriads of applications ?
>
> IANA has a registry of charsets, and many of them have mappings defined
> for UCS. There is also an ISO register that has mappings between
> legacy charsets and UCS, available at
> http://www.dkuug.dk/cultreg/registrations/charmap
>
> Kind regards
> Keld

[idn] utf8/legacy versioning

Reply via email to