"Eric A. Hall" <[EMAIL PROTECTED]> wrote: > > The tricky part is that some of these subtypes are already in > > wide use in a wide variety of protocols without having ever been > > formalized. > > > > my intent is that the host field of a URI, the exchangers listed in > > an MX record, and the domain field of an HTTP cookie are all of type > > "host name", but no such connection has never been formally drawn > > between these various protocol elements. > > I think I can speak for John when I say that this is what he and I > both want to see fixed.
It would be great to have that fixed, but when will it happen? Can IDNA afford to wait for it? If not, should IDNA go ahead and limit the applicability of Nameprep to the still-vague concepts of "host name label" and "mail domain name label" in the hopes that those terms will become rigorous? I pose these questions to the working group--would this be comforting or disconcerting? In the past, any time Paul and Patrik and I have tried to talk about "what is a host name" it's been a mess. > Again, I would make the applications do the stringprep management, > since they have to do so anyway for basic security measures. All IDNA > needs to be is a simple codec that converts inputs and outputs (unless > I completely misunderstand its inner workings). We bundle Stringprep and Punycode and the prefix into a single operation for convenience of description, so we can say something simple like "for any text label X, ToASCII(X) is an equivalent ASCII label". We needed ToASCII and ToUnicode to serve as definitions of the correct results for *any* input, not just already-prepared input. If you have already-prepared input, you can optimize the Stringprep step down to nothing. IDNA never says that ToASCII needs to be performed all at once by a single entity. For example, an IDN-aware interface can require its non-ASCII input to be already prepared, in which case when it performs ToASCII the Stringprep step has no effect and can be optimized away. The thing calling the interface then has the burden of performing Stringprep, but not the rest of ToASCII. In this scenario the work of ToASCII has effectively been split across two entities. > > A hypothetical newDNS protocol that allowed text labels to be > > represented using UTF-8 while still supporting the RFC-1035 > > sequence-of-bytes labels would need an extra bit per label > > indicating whether byte values 80..FF are UTF-8 text, or opaque > > bytes like in DNS. A hypothetical new resolver interface would > > likewise need this extra bit per label if it wanted to support both > > text-labels and byte-labels. > > The resolver doesn't have to keep track of this Sure it does. If I pass an 8-bit label to a new-resolver, the resolver needs to know whether the 80..FF values are UTF-8 text or opaque bytes, so that it knows how to set that bit in the newDNS query, or so that it knows whether to convert to ACE before sending a DNS query. AMC
