John,
I'm probably missing something, but if the apps are not currently warning the user when characters from blocks that ought to be banned appear, then people and/or tools may be generating references (e.g. HTML documents) to those characters, without realizing that those characters are being mapped to some other base characters, which then work OK in the DNS lookup.
Maybe it is unlikely that a lot of such references would come to exist, and it wouldn't be such a burden to the user of a new app to see the occasional error e.g. when they click on such a link.
But how do you determine how many HTML documents contain bad characters in their links, and how do you decide that that number is low enough to make such a change to the spec?
So I'm wondering if you would really be able to state that such a change "would largely impact what can be registered". How do you know that such a change does not also impact the users of new clients accessing existing documents?
Oh wait, I know! Just get Google to do a survey in their cache?
Erik
John C Klensin wrote:
(i) A change that would largely impact what can be registered needs to be reflected and implemented only in 250-odd registries. The registry operators are mostly on their toes, communicate with each other, and many of them are pretty early in their implementation of IDNs and conservative about what they are permitting. Getting them to make changes is an entirely different sort of problem than, e.g., trying to change already-installed browsers or client plugins or getting people to upgrade them.
(ii) The main things I've seen in observing and working with registries that I didn't understand well enough a couple of years ago to argue forcefully are things that we might be able to change because the impact of whether someone was running an old or new version would not be large. For example, IDNA makes some mappings that are dubious, not in the technical sense of whether the characters are equivalent, but in the human factors sense of whether treating them as equivalent leads to bad habits. To take a handy example from a Roman ("Latin")-based script, I now suspect that permitting all of those font-variant "mathematical" characters to map onto their lower-case ASCII equivalents is a bad idea, just because it encourages users to assume that, if something looks like a particular base character, it is that character. That, in turn, increases the perceptual window for these phishing attacks. If, instead, we had simply banned those characters, creating an error if someone tried to use one rather than a quiet mapping into something else, we might have been better off. So I now think we should have banned them when IDNA and nameprep were defined and think I could have made that case very strongly had I understood the issues the way I do now. Is it worth making that change today? I don't know. But I suggest that it would be possible to make it for two reasons: (a) such a change would not change the number of strings or characters that can be registered at all: only the base characters can actually appear in an IDNA string post the ToUnicode(ToASCII(char)) operation pair and (b) if I were a browser or other application producer, I'd be seriously considering warnings if any characters from those blocks appeared... something IDNA certainly does not prohibit. Changes that increased the number of registerable characters are problematic, but not that problematic if they don't pick up a character that now maps and make it "real" (which is the problem with deciding that upper case Omega is a good idea). Reducing the number of characters that can be registered --making a now-valid base character invalid-- would be a much harder problem.
