Hi, Kent Welcome to hangul problems. You may have missed "hangulchar" I-D in WG pool. Your analysis is thorough, but seems to be already addressed by the I-D. And there have been more discussions offline more about other hangul nameprep issues.That will be included new hangulchar 2.0 I-D soon.
Regards, Soobok Lee ----- Original Message ----- From: "Kent Karlsson" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Tuesday, November 13, 2001 12:01 AM Subject: Hangul and IDN (was Re: [idn] reordering strawpoll) > > Regardless of reordering, there is an actual problem for Hangul, which > I don't think has been addressed. I have lately, and with the help of a > Korean colleague, been looking fairly deeply into the problem of > collating (ordering) Hangul strings properly. So even though I cannot > understand Korean, and only begin to be able to read the letters (I look > more at code point numbers than glyphs), I've looked quite a lot into this. > See also page 53 of The Unicode Standard 3.0, which deals with > Hangul syllables. Let me just pick an example. The number of instances > are in the thousands, but the basic problem is the same. > > The precomposed Hangul syllable U+AE4C (GGA) is canonically equivalent > with <U+1101, U+1161> (GG, A), through algorithmic decomposition. That is fine > so far. But <U+1101, U+1161> is in turn equivalent to <U+1100, U+1100, U+1161> > (G, G, A), but this equivalence is neither a canonical equivalence, as it > should have been, nor a compatibility equivalence. Still, the latter letter > sequence represents EXACTLY the same syllable as the two earlier character > sequences, and a proper rendering engine (of which there are already some, > I'm told) would correctly render the three sequences in the same way. > But for historical reasons, there is now neither a canonical, nor a compatibility > equivalence there. Just an equivalence, in the same script, in syllabic meaning > and (when properly implemented) in display. (Yes, G and GG are pronounced > differently, but this is about spelling.) > > This is something that 'nameprep' should handle, since it is unfortunately not > handled by NFKC. The logical steps would be to 1) algorithmically decompose > Hangul syllables, 2) map cluster Jamos to the basic letter sequences each > represent. Then either (design decision) invoke NFKC or NFKC augmented > to compose also "modern" cluster Jamo's before the part of NFKC formation > that does algorithmic composition of Hangul syllables (the historic cluster > Jamos can (design decision) stay decomposed). Or, indeed, do the > decomposition into basic (i.e. non-cluster) Hangul Jamo letters, after > conversion to NFKC form, leaving Hangul "subnames" as sequences of > letter characters, just like for other alphabetic scripts (I don't know how this > would effect the length of ACE encoded IDN names). (Some thought > needs to go into how ((Halfwidth)) Compatibility Hangul Letters are to be > handled. The compatibility mapping are, ahem, not fully appropriate... > The Hangul "filler" characters are also a problem, which needs to be > considered.) > > Kind regards > /kent k > > >
