Erik van der Poel wrote:I feel that we are still at the very beginning of the adoption of the particular Unicodes affected by this mistake. Most of them are for South Asian languages. Hangul is much further along, but not the particular Unicodes that are affected here (i.e. the Jamo).
It's not that easy. When you use the old algorithm, you get normal Hangul syllables, which would be allowed in IDNA. It's only that the sequence *before* the normalization should not be allowed.
No, these strange sequences should not be disallowed. The specs should be corrected so that the implementations can all treat these strange sequences the same way.
More importantly, this mistake only affects highly unusual, malformed data. I think that if IDNA decides not to follow Unicode's recommendation now or in the next couple of years, 10 or 20 years from now we would look back in time and regret this decision.
I don't think so. "We" could still change the decision in 20 years, and not a single registration would be affected. The sequences causing the behaviour change are *really* unusual - I don't know if software can visually render them in a meaningful way, and I guess a native speaker would just consider them moji-bake. So it is unlikely that anybody would try to use them as input to IDNA in the next 20 years in a reasonable application.
If we do not correct the specs, more and more implementations will be created and deployed, some implementing it one way, the others the other way. It is hard to change something when a lot of implementations have been deployed. This is why we have to act now (or soon). We have to nip it in the bud.
Erik
