This message argues both sides of the issue. :) Soobok Lee <[EMAIL PROTECTED]> wrote:
> The latter can be as catastrophic as the former. I assume you meant that false negatives (where names don't match when they should) can be as catastrophic as false positives (where names match when they shouldn't). Can you back up that claim? > if each application vendor adopts its own different nameprep profile, > applications behaviors may be unpredictable across applications for > end users. Do you have a suggestion? What should happen when an application encounters a name that uses code points newer than the application's version of nameprep? If the application prohibits unassigned code points, then the name will never match anything, because ToASCII will fail. If the application allows unassigned code points, then the name will never match the wrong thing, and might sometimes match the right thing (in practice, I think it usually will work). Which is preferable? The conservative approach (never match) is more predictable, but the other approach (match if you're lucky) might make users happier. Wait, I just realized why we needed to avoid comparing two strings that have both been prepared using loose stringprep. If they both use unassigned code points that turn out to be prohibited in future versions of nameprep, then they might match even though they are both invalid names. That's a false positive, which is bad. So we do indeed need to avoid such comparisons. Disregard my suggestion from my last message. Perhaps the stringprep spec should say that applications may use loose stringprep only if they know for sure that the name will never be compared against a name that was also prepared using loose stringprep. If there's no way to know, then you must use strict stringprep. In the case of DNS, if the IDNA spec requires authoritative servers to use strict nameprep, then clients are free to prepare queries using loose nameprep. Other protocols could in principle use similar methods--requiring strict nameprep at "one end" (whatever that means for that protocol) so that the "other end" can use loose nameprep. But how practical is that? Take email headers for example. Who has any idea what will be done with domain names that appear in email headers? Maybe it would be a lot simpler and safer just to prohibit unassigned code points always. If you want to use new characters, you'll just have to upgrade your software to the new nameprep, sorry. Can we get some more people involved in this thread? I think Soobok is right that the existing wording in stringprep about "stored strings" and "query strings" is going to be very difficult to interpret in practice, and something needs to be done about it, but I don't know what. AMC
