This message responds to things in the order they appeared in the previous message. It might be easier to follow if I reordered some of it, but composing it has left me too tired to do so. You might want to make two passes. :)
"Eric A. Hall" <[EMAIL PROTECTED]> wrote: > And since the encoded representation of the original string has to be > case-neutral for it to work in the legacy label, the input label has > to be made case-specific. I asked for a definition of "case-specific", but you didn't give me one. Now you've introduced a second term I'm unfamiliar with, "case-neutral". I honestly don't know what you mean by those terms. > I agree with the requirement to make *some domain names* lowercase in > order to facilitate simple comparisons. > > However, there are other domain names where this is not required. The same could be said for domain names that use only ASCII characters. The case-insensitive comparison might not required for some of them, but it's done for all of them, whether they need it or not. IDNA extends this philosophy to IDNs. The same comparison rules apply to all IDNs. With ASCII domain names, we had the luxury that we could preserve case even while ignoring it; that is, ASCII domain names are case-insensitive/case-preserving. With IDNs in their ACE form, this is not possible (at least not officially); instead we have to choose case-insensitive/non-case-preserving or case-sensitive/case-preserving, and we chose the former, judging the case insensitivity to be more important that the case preservation. If applications want to map case-sensitive identifiers directly onto domain labels in owner names of resource records, they can, as long as they avoid collisions. This is already true for ASCII domain names, and continues to be true for IDNs. For example, in the ASCII world, we can store information under the name FoO.net, and when someone looks up FoO.net, they'll find the information. We can't store different information under fOo.net, even though fOo is a distinct identifier from FoO. Now reread that example but imagine an umlaut over the middle letter. With IDNA, it's all still true. The only difference is that if you do a reverse query, you won't get the original capitalization, but reverse queries are almost never used, so it's no big deal. If case-preservation is important to an application (which is more plausible if the applications is mapping case-sensitive identifiers onto domain labels *inside* resource records), then it can define a mapping that preserves case. For example, it could use 8-bit labels (like unprepped UTF-8 or foo-prepped UTF-8), or it could use something similar to IDNA but with a different Stringprep profile and a different prefix. > Microsoft already provides a direct encoding of NetBIOS names into > UTF-8 and simply applies their own interpration to the RRs. Under > your rules, they couldn't use i18n domain names as effectively as > STD13 domain names since they would have to sacrifice capitalization > in the process. Right, IDNA sacrifices case-preservation in order to get compatibility with the ASCII infrastructure while continuing the tradition of case-insensitive domain names. Maybe NetBIOS doesn't like that tradeoff, and would rather have case-preserving reverse queries. Microsoft can continue to use 8-bit DNS names if they want. Here's a stab at the relationship (or lack thereof) between 8-bit labels (which are allowed by RFC 1035) and IDN labels (which are defined by the IDNA spec). I'm sure some people (even IDNA supporters) might not agree with this interpretation, because it's a subtle question, but here goes: 8-bit labels in DNS (not EDNS) owner names, even if they are interpreted as UTF-8 by the end clients, do not contain non-ASCII characters from the point of view of the DNS infrastructure. RFC 1035 does not specify or allow any charset other than ASCII, therefore it's impossible for an owner name to contain non-ASCII characters; the octets 80..FF are allowed, but they don't represent any particular characters (from the point of view of DNS). Since IDNA explicitly applies only to text labels (which I understand to mean labels composed entirely of characters), DNS labels containing 80..FF are completely outside the scope of IDNA; they are not IDN labels, but labels in an independent namespace that is not subject to any of the rules for IDNs. Applications can continue to use use such labels under the rules of RFC 1035 without regard to IDNA. I personally think this namespace is under-specified and risky and should not be used, but it will still be as present after IDNA as it was before. That analysis applies to DNS, not EDNS. EDNS could define entirely new semantics and comparison rules for its 8-bit labels; there is no need for EDNS labels to be compared in the same way as DNS labels. EDNS could even define multiple namespaces each with its own comparison rules, and tag labels according to which namespace they belong to. > I think you are missing a key concept here, which is that all of the > RRs are going to need to i18n definitions, and IDNA alone won't do it. I'm not missing that, I know that. IDNA alone won't do it; IDNA is only for internationalized textual domain labels. RRs that map other data types onto domain names might find that IDNA is not suited to their needs. They are welcome to define a more suitable mapping directly from their non-domain-label data type to ASCII domain labels. They are welcome to borrow techniques from IDNA. > When the RR rules are defined, they will get stringprep profiles > assigned to them. At that point, the applications which create and > interpret the unencoded labels are the only ones that need to know > anything, and the infrastructure can store, transfer, compare and > convert the opaque octets. I'm finally starting to understand your vision. In order to correctly convert labels between the non-ASCII form and the ACE form, or to compare two labels, you would need to EITHER know and perform the preparation algorithm for this particular label OR be guaranteed that the the appropriate preparation has already been applied (in which case you don't need to know what it is). That's interesting, but I see a security concern: What if the entity handing you the label is maliciously not applying the appropriate preparation? It can trick you into making errors when comparing labels, or trick you into converting a label into a non-equivalent label. If you don't know the appropriate preparation algorithm, you can't detect this or protect yourself from it. I suppose it might be possible to address that concern. If you have an interest in the correctness of a comparison/conversion, then you need to know the preparation algorithm yourself. On the other hand, if you are performing comparisons/conversions without knowing the preparation algorithm, you better not be relying on the correctness of the answers, you better just be converting a name on behalf of the entity that prepared it, or comparing names on behalf of an entity that prepared one of them. In your vision of the world (as I understand it), the ASCII infrastructure would use ACE, while the new improved infrastructure would use prepared-UTF-8, where the preparation is not the same for all labels. But I don't see what's so great about prepared-UTF-8 versus ACE. Both require the same amount of effort from all applications; they have to do something special at the boundaries, the difference is in the details. If you want to sell me on a new infrastructure, it's got to offer something more. If it accepted unprepared-UTF-8, then I might find it interesting, because it would require less effort from applications that already use Unicode. But in order for the infrastructure to accept unprepared-UTF-8, the infrastructure would need to know the preparation algorithm... > (1) Anything which fits in the i18n namespace is a domain name, > regardless of whether or not IDNA can handle it. The i18n namespace doesn't exist until we define it. As I argued above, the 8-bit namespace that already exists under RFC 1035 is not an i18n namespace, because it cannot represent non-ASCII characters, because no charset other than ASCII is specified. It is the job of this working group to define the i18n namespace. The definition of IDN in the IDNA draft is a definition of the i18n namespace. > > If you only care about the end applications, which know the special > > semantics of the special labels, then just use a different prefix > > to go with your different Stringprep profile. Then you can be sure > > that entities that know IDNA but don't know about your special > > labels won't accidentally muck with them. > > No, that won't work. I don't see why not. At some point, someone needs to perform Stringprep with some profile. Why can't that entity go ahead and do the whole transformation at the same time (Unicode to ASCII, prepend the prefix)? What's the advantage of doing only the Stringprep part? AMC
