Erik van der Poel wrote: > Soobok Lee wrote: > >> U+1160 is a space-like char and even stringprep/nameprep does not >> filter it out because the char is not for punctuational purpose. > > > U+1160 is HANGUL JUNGSEONG FILLER and it is used to transform > nonstandard syllables into standard ones (Unicode 3.0 section 3.11 > (RFC 3454 refers to Unicode 3.2.0)). However, this transformation is > one of the additional transformations not considered part of Unicode > normalization (3.2.0's UAX #15 Annex 10).
Exactly. U+1160 is not "touched" by Unicode normalization (NFC). > So this character is not generated by Stringprep/Nameprep.However, it > is not prohibited either, so it may occur in the input to (and output > from) Stringprep/Nameprep. Yes, it may occur. > I read some of the sections on Hangul in the Unicode book and Web > site, but I did not see any rules regarding repeated occurrences of > U+1160 (as you had in your example, not quoted above). I also did not > see any rules about what to do when a filler is not followed by a > Hangul jamo. It would be nice to have these rules in Unicode or in > Stringprep. U+1160 problem has been raised 3.5 years ago (you can look into this huge idn-list archive by keyword search for 1160 or filler) with some additional hangul jamo problem. One draft has been submitted by me (you may find that in www.i-d-n.net) to filter out these invalid char sequences. But the draft had been discarded . Someone argued that such filtering * complicates * stringprep algorithms with context-sensitive filtering/prohibiting and the problem is up to UTC/NFC not to IETF. of course, i couldn't accept that. Anyway, we can't backtrack into 2002/Dec without giving up backward compatibility promise of stringprep. > > I tried U+1160 followed by a Latin character in MSIE with i-Nav and in > Firefox with IDN turned on, and it was displayed as a wide space. It > is unfortunate that both implementations chose to display it as a > space instead of deleting it. Yes. Plugins M U S T filter out U+1160 from validated ToUnicode()ed labels, whether or not IDNA requires that. Soobok
