On Wed, November 18, 2015 8:56 am, Peter Bowen wrote: > On Wed, Nov 18, 2015 at 2:22 AM, Rob Stradling <rob.stradl...@comodo.com> > wrote: > > I would also like to get clarification on if/when the underscore > > character > > may be used in each of the name types. Your report seems to flag > > underscores as always prohibited (I think), but I expect that some CAs > > would > > be surprised by that. > > Here is a set of rules that are functionally equivalent to the ones > I'm using to check dNSNames in GeneralNames: > > LABEL = "((?!-)[A-Za-z0-9-]{1,63}(?<!-))" > FQDN = "(#{LABEL}\.)*#{LABEL}" > WILDCARD_DN = "\\*\\.#{FQDN}" > DNSNAME = "(#{FQDN}|#{WILDCARD_DN})" > > dNSName =~ /\A#{DNSNAME}\z/ > > The FQDN rule is based on RFC 5280 section 4.2.1.6, which in turn > references RFCs 1123 and 1034. There is no allowance for underscores > in domain names in these RFCs. > > Thanks, > Peter
You've entered a special hell. It is dark and scary. You are likely to be eaten by a grue. The world is an awful place. Hostnames, doubly so. A big part of this is due to how MSFT originally implemented their resolver code, although arguably it can affect non-MSFT platforms as well, depending on the name server setup. Recall that DNS labels are full 8-bit, however, for practical purposes (read: compatability), it's best to treat them as 7-bit ASCII. This is somewhat touched upon in 1034 ("By convention, domain names can be stored with arbitrary case ...") and in 1123 ("The DNS defines domain name syntax very generally -- a string of labels each containing up to 63 8-bit octets") Terminology wise, let's call those "labels". A series of labels, terminated by the empty label, make up a domain name. One type of domain name is the host name (c.f. 1034, "For hosts, the mapping depends on the existing syntax for host names which is a subset of the usual text representation for domain names"), which corresponds to the A domain record type (or AAAA, as later modified by the IPv6 specs) OK, so we're clear so far? Recap is: label = 0-63 octets domain name = a series of labels, terminated by an empty label, not to exceed 255 octets (counting label lengths as well) host name = a subset of types of domain names, that in DNS corresponds to the A/AAAA record Now let's get messier yet still. 1034 introduces the "Preferred Name Syntax", which is a recommendation for how to encode names. For example, one part is that it suggests that all labels start with at least one letter. This is to avoid ambiguity when parsing IPs, since if labels could be all numeric (10.0.0.1), then it could be ambiguous as to how to parse as a host name versus an IP address. However, 1123, Section 2.1, relaxed this to allow the first character to be a digit, on the presumption that all TLDs would be alpha-numeric. This latter point wasn't enshrined anywhere, as far as I've been able to tell, but was practiced by the set of gTLDs at the time and continues to be practiced by ICANN (thus far). So, now, the question is, where do the '_' come from? Two places: 1) The URI spec (RFC 2396) permitted them because it didn't couple a URL to the underlying name resolution system (DNS), but instead permitted a variety of name and name resolution schemes. The ABNF from this spec diverged from 1123, and 3986 tried to bring alignment again, but the 'damage' of permissiveness was done. 2) Microsoft's host resolution API, which supported a variety of name types (DNS, NetBios, WINS, etc), in which the incoming string was looked up against a variety of name resolution services. Their DNS resolver adhered to the '8 bit is good bit' and '7 bit ASCII is good', and thus let it through. Further, it's important to consider that _ are valid (domain) names, and ARE valid (URL host) names, even if they're not valid (DNS host) names. Consider, for example, SRV names. You hate everything yet? Because I sure do. I captured some of these thoughts in https://code.google.com/p/chromium/issues/detail?id=496472 and https://code.google.com/p/chromium/issues/detail?id=496468 just because no browser I've looked at 'does the right thing' and rejects underscores. I mention all of this to say that I actually find it 'not clear cut' as to what's expected, and have spent several day long dives into specs and other implementations to see if there's any common consistency, especially for https://url.spec.whatwg.org/ . On a pragmatic level, I'd like to be a hard liner, with being one clear interpretation, but in the real world, I can't find anyone who consistently followed or implemented that guidance. _______________________________________________ dev-security-policy mailing list dev-security-policy@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security-policy