The Unicode consoritium debated making the canonical decomposition from <gg> to <g><g> for a long time. The deciding feedback was from the Korean national body at the Seoul SC2/WG2 meeting, where they said it should not be done; that it was akin to canonically decomposing "w" to "vv". They also objected to combinations like <gs> being canonically decomposed, principally so that modern syllables could always be decomposed into 3 pieces. The (weaker) compatibility decompositions in Unicode until the time that NFC was formed; those were removed because they would have prevented the formation of Hangul Syllables in NFKC.
Mark ————— Γνῶθι σαυτόν — Θαλῆς [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com ----- Original Message ----- From: "Soobok Lee" <[EMAIL PROTECTED]> To: "Kent Karlsson" <[EMAIL PROTECTED]>; "'Erik Nordmark'" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Tuesday, February 12, 2002 18:06 Subject: Re: [idn] Comments on IDNA/stringprep/nameprep > Thanks, Kent. > > ----- Original Message ----- > From: "Kent Karlsson" <[EMAIL PROTECTED]> > > > > > > Even though e.g. [gg] and [g][g] (there are a few hundred other examples) > > > > are not canonically or compatibility equivalent, they still represent > > > > the same sequence of Hangul letters, and thus "mean" the same. > > > > > > Yes, same argument is used for SC/TC needing to be addressed in IDN. > > > > No, no, no!! This issue is comparable to the *canonical* equivalences > > that already exist for Hangul syllable characters, and for other > > characters that have a canonical decomposition (some "double latin > > letters" have compatibility decompositions, but the relationship here > > is much stronger; and it is much much stronger than case insensitivity). > > Unfortunately, due to historic events, that equivalence is no longer > > recorded in Unicode 3.0 and later property data. > > > > This is in no way comparable to the SC/TC issue which is a spelling > > preference issue, where the "spellings" are actually different. > > Here it is just about the underlying representation for the **same** > > spelling (in terms of sequence of letters; there is not even any > > case difference or font variant difference [for correctly constructed > > fonts that cover Hangul]). > > > > > True. the canonical equivalence between [gg] and [g][g] is defined in the > unicode 3.0 . They should have been unified by NFC, but haven't correctly. > > Too late to be changed. and It should be solved in new normalizatio forms. > But If applications use the new normalization before nameprep, > As i warned in the last call comments, the following condition will be > trigerred silently, > > stringprep(newnormalization(Hangul)) != stringprep(Hangul) > > If stringprep would be neutral to new normalization adopted by applications, > stringprep should be perfect and inclusive of all kinds of mature normaliztions, > that is, the universal set of all kinds of normalizations built upon unicode. > Impossible? > > Applications implementors should be cautious when applying normalizations forms > to data/texts portions that contain IDN. If some applications already adopted some > normalizations forms that are not compatible to stringprep as above, > backward compatibility requirements are not met in that case. > IDNA's backward compatibility claim doesn't come without costs. > > Don't build our grand castle on the moving sand dune, on which a tiny tent is more adequate > and wise choice. :-) > > Soobok Lee > > > >
