[Edmon, sorry you got two copies of this, it was an accident.] Edmon Chung <[EMAIL PROTECTED]> wrote:
> non-ASCII requests are not "randomly misinterpreted", at least not in > NeDNS, they get uniquely resolved into the intended domain name that a > non-aware user is typing in. Really? So if I type a name, and my IDN-unaware software blindly copies the name into a DNS request (which won't necessarily happen, but let's say it does), the server will match on the name I intended no matter whether my system uses iso-8869-1 or shift_jis or iso-2022-jp or euc-jp or UTF-8 or UTF-16be or UTF-16le or...? And it will match uppercase Latin letters with fullwidth lowercase Latin letters? And it will match precomposed characters with their decomposed equivalents? Isn't it inevitable that there will be collisions between the various charsets, so that a request for one name using one charset will accidentally match an unrelated name stored under a different charset? That's what I mean by random misinterpretation. In the current DNS protocol, the semantics of octets 80..FF are undefined, so the server cannot know what the client is really asking for--it cannot know what the user typing the name really saw on the screen. It can guess, and maybe guess right most of the time, but it can't really know. > Adam, whether you like it or not, the reality is that non-ASCII > request are reaching registry name servers and whether you resolve > these domain names or not is the operator's choice. That's true, I'm just saying it's risky. Whenever text is passed around without a charset tag (implicit or explicit), there is a risk of misinterpretation. If RFC 1035 had said clearly "domain names in zones shall not contain 80..FF until the semantics have been defined", then we could decide tomorrow that 80..FF are nameprepped UTF-8 (or arbitrary UTF-8 if we require the server to perform Nameprep), and there would be no problem. Unfortunately, RFC 1035 contained no such clear prohibition, and various conflicting interpretations of 80..FF have been deployed, and now any standard definition will conflict with those deployments, and so people who want standard 8-bit names are turning to EDNS and/or new classes. AMC
