This is commentary on
draft-hoffman-stringprep-00.txt draft-ietf-idn-nameprep-07.txt draft-ietf-idn-idna-06.txt draft-hoffman-stringprep-00.txt: | 4. Normalization | [[[[[ Patrik asks: Is not addition of unassigned codepoints regarded | an update of UAX15? This is relevant here. ]]]]] embedded comments need to be addressed or removed Overall, I think that stringprep needs to define a data structure for profiles and also needs to discuss version negotiation, both of which are needed to facilitate programmatic upgrades. This means defining a standardized character notation scheme to be used by all profiles, a file structure (version tags, comment notation, line syntax, EOL markers, etc), and so forth. This also means describing suggested common mechanisms for retrieving profiles. One obvious method would include "will be made available via FTP by IANA", for example. Encouraging vendors to "provide updates in routine patch kits" is another. Protocols can extend on that as needed as well, (it is feasible that DNS could even use zone files and the zone transfer services to retrieve version 1.1 of the subordinate stringpreps). The point here would be to encourage rapid adoption of updated profiles, through as many mechanisms as possible. No method will work 100% of the time, so find five mechanisms that each work 60% of the time. The hole I am concerned with here is with future mappings, where 'A' is currently assigned but it is not currently mapped to 'a', and where a future version creates such a mapping. In that case, the client will generate 'A' as unmapped, the server will respond with an error or with 'a', either of which would trigger a failure on the client. Consider ISO/IEC 8859-1 and 8859-15, where the former had lowercase y with diaeresis and where the latter defines an uppercase y with diaeresis. It seems likely that this kind of disjointed mapping will happen again, so we need to be thinking about ways to minimize the disjointedness. Secondarily, the IETF has had the lesson of versioning beaten into it over the last few years, and it seems odd not to address this issue head-on in the controlling spec. draft-ietf-idn-nameprep-07.txt: | Stringprep Profile for Internationalized Host Names Since this profile is specifically for host names, all of the rules that apply to hostnames should be placed here. Consider that there are apps and protocols which may use these rules to build internationalized host names but will not use IDNA (either because they do not need IDNA conversion for their data-path or they use something other than IDNA to encode the names). Those services will need access to the complete set of rules. Putting some of the rules here and some of the rules in IDNA essentially means that every application that wants to build an i18n hostname has to use both specifications, even though the "profile" is the only document that a reasonable person should be expected to know about. As it stands, this profile is incomplete without IDNA, but it does not reference IDNA as a mandatory component (nor should it). Furthermore, any subsequent work in this field will almost certainly be required to obsolete this profile in order to clarify these issues, which reinforces the need for versionin in stringprep. draft-ietf-idn-idna-06.txt: | Abstract | This document defines internationalized domain names (IDNs) That definition should be in a separate document, specifically either in nameprep or in a stringprep profile that defines i18n "domain names" rather than hostnames. IDNA should "define a mechansim for encoding internationalized domain name labels into STD13 labels". | 2 Terminology | The "ACE prefix" is defined in this document to be a string of ASCII | characters that appears at the beginning of every ACE label. It is | specified in section 5. I thought the idea behind adopting "punycode" was to avoid the use of the generic "ACE" term? | 3. Requirements | IDNA conformance means adherence of the following three rules: | 1) Whenever a domain name is put into a generic domain name slot, | every label MUST contain only ASCII characters. "generic domain name slot" should be substituted with "STD13 slot". There is no reason whatsoever that application protocols cannot exchange IDNs or IHNs in whatever form they wish. | 2) ACE labels SHOULD be hidden from users whenever possible. I have argued against this before and I will do so again now. Should WHOIS servers perform this conversion on output (email addresses and nameservers) since it is likely to be displayed to a user? Should an email client perform conversion on a message? Every protocol and data-format has unique considerations. Making blanket statements like this is harmful. CONVERSION MUST ONLY OCCUR WHERE A PROTOCOL OR DATA-FORMAT EXPLICITLY DEFINES THE BEHAVIOR. MANDATORY CONVERSION WILL BREAK THINGS IF THERE IS NOT AN AGREED-UPON METHOD FOR CONVERSION SPECIFIC TO THE APPLICATION. | 4.1 ToASCII | 4.2 ToUnicode Add a disclaimer to the effect of ~"these routines MUST NOT be used except where the data is known to contain an i18n domain name label". Trying to pipe all Unicode data through ToASCII will be harmful, to say the least. Trying to pipe an STD13 binary domain name through ToUnicode may result in some false-positives. | B. Design philosophy What value does this section provide? This text is religion not science, seeing as how it omits key counter-arguments, namely that there are hundreds if not thousands of applications for every DNS server, so upgrading servers in addition to applications represents an incremental cost (rather than the "The cost of implementing IDN may thus be much lower, and the speed of implementation could be much higher" erroneous opinion). Other aspects (such as ~"no change to protocols") are just as wrong (protocols MUST BE CHANGED FOR CONVERSION TO BE PROPERLY MANAGED), if not argumentative. This section is non-contributory to say the least, and possibly even a hindrance to future developments, and has no place in a technical specification such as this. General comments: In general, I think that there is a tremendous amount of architectural work that needs to be done here. We need to strongly and clearly define the different domain name data-types that are in use, make up stringprep profiles for them, impose versioning controls on stringprep profiles, and tighten up the IDNA spec. I think at this point that IDNA is pretty close to where it needs to be, but it is still far too grandiose. It should only define an encoding for specific data-types, and it should not encourage any interpretative behavior on the applications. One specific item, there does not seem to be a prohibition against certain characters in the first position of a domain name label. There is a minor prohibition in IDNA, but there needs to be an exception list in nameprep that covers the characters we discussed in the WG earlier, which are punctuation, diacritical marks, or dashes. We discussed this at length on the mailing list. "Adam M. Costello" wrote in <[EMAIL PROTECTED]>: > There are some code points that are prohibited in host names, but not > in all textual domain names. The underscore is the best example. > These prohibitions belong in ToASCII. I disagree with this for a couple of reasons. First of all, these exceptions are associated with protocol identifier labels, which are not hostnames. Nameprep defines i18n hostnames, so these characters belong in the nameprep prohibition. Secondarily, "protocol identifiers" are generally considered as protocol operators which do not need to be internationalized (therefore never needing anything but ASCII). This is why I defined them as a separate data-type (and why I think they need their own stringprep profile). See BCP18: | 2. Where to do internationalization | | Internationalization is for humans. This means that protocols are not | subject to internationalization; text strings are. Where protocol | elements look like text tokens, such as in many IETF application | layer protocols, protocols MUST specify which parts are protocol and | which are text. [WR 2.2.1.1] SRV owner labels are protocol elements, not text. They should never go through any conversion. A dedicated profile for this would help. I agree with many of the other comments which have been made. I think that a parent i18n "domain name" stringprep profile can describe certain generic characteristics (such as mapping of full-stop characters) that the label-centric profiles cannot. DJ Bernstein is right-on when he says that certs don't solve the spoofing problem. There is nothing preventing the wiley hacker from getting a cert for [EMAIL PROTECTED], and there would be nothing about that cert which would alert anybody that spoofing had occurred. In general terms, this means that I agree with DJB that the initial scope should be tightly controlled and slowly incremented. I think that this opinion was also expressed by J Klensin's solicitation for a "safe-set" and D Crocker's endorsement of a safe-set. This doesn't mean we all agree with DJB, but I think that in this case, he is probably correct. A stronger versioning system in Stringprep would make this a feasible approach, I think. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
