This has been the object of some debate, read Lisa's errata rejection of ID 1081 and 1353...
https://www.rfc-editor.org/errata/rfc1123 On Sep 16, 2017 10:00, "Eric Covener" <cove...@gmail.com> wrote: On Sat, Sep 16, 2017 at 9:48 AM, Yann Ylavic <ylavic....@gmail.com> wrote: > On Sat, Sep 16, 2017 at 3:37 AM, Eric Covener <cove...@gmail.com> wrote: >> On Sat, Dec 29, 2012 at 8:23 PM, <s...@apache.org> wrote: >>> >>> +/* >>> + * If strict mode ever becomes the default, this should be folded into >>> + * fix_hostname_non_v6() >>> + */ >>> +static apr_status_t strict_hostname_check(request_rec *r, char *host, >>> + int logonly) >>> +{ >>> + char *ch; >>> + int is_dotted_decimal = 1, leading_zeroes = 0, dots = 0; >>> + >>> + for (ch = host; *ch; ch++) { >>> + if (!apr_isascii(*ch)) { >>> + goto bad; >>> + } >>> + else if (apr_isalpha(*ch) || *ch == '-') { >>> + is_dotted_decimal = 0; >>> + } >>> + else if (ch[0] == '.') { >>> + dots++; >>> + if (ch[1] == '0' && apr_isdigit(ch[2])) >>> + leading_zeroes = 1; >>> + } >>> + else if (!apr_isdigit(*ch)) { >>> + /* also takes care of multiple Host headers by denying commas */ >>> + goto bad; >>> + } >>> + } >>> + if (is_dotted_decimal) { >>> + if (host[0] == '.' || (host[0] == '0' && apr_isdigit(host[1]))) >>> + leading_zeroes = 1; >>> + if (leading_zeroes || dots != 3) { >>> + /* RFC 3986 7.4 */ >>> + goto bad; >>> + } >>> + } >>> + else { >>> + /* The top-level domain must start with a letter (RFC 1123 2.1) */ >>> + while (ch > host && *ch != '.') >>> + ch--; >>> + if (ch[0] == '.' && ch[1] != '\0' && !apr_isalpha(ch[1])) >>> + goto bad; >>> + } >>> + return APR_SUCCESS; >>> + >>> +bad: >>> + ap_log_rerror(APLOG_MARK, APLOG_DEBUG, 0, r, APLOGNO() >>> + "[strict] Invalid host name '%s'%s%.6s", >>> + host, *ch ? ", problem near: " : "", ch); >>> + if (logonly) >>> + return APR_SUCCESS; >>> + return APR_EINVAL; >>> +} >> >> (sorry for the necromancy of this very old commit) >> >> Re: the 1123 2.1 reference a dozen lines from the end of the function: >> RFC 1123 2.1 seems to say the opposite. Just a bug or something over >> my head? >> >> 2.1 Host Names and Numbers >> >> The syntax of a legal Internet host name was specified in RFC-952 >> [DNS:4]. One aspect of host name syntax is hereby changed: the >> restriction on the first character is relaxed to allow either a >> letter or a digit. Host software MUST support this more liberal >> syntax. > > RFC 1123 2.1 seems to be about the first character of the host, while > the code checks the first one of the TLD. Are there TLDs starting with > a digit? I see, thanks. The basis in 1123 is a bit later in 2.1 but doesn't really seem normative: If a dotted-decimal number can be entered without such identifying delimiters, then a full syntactic check must be made, because a segment of a host domain name is now allowed to begin with a digit and could legally be entirely numeric (see Section 6.1.2.4). However, a valid host name can never have the dotted-decimal form #.#.#.#, since at least the highest-level component label will be alphabetic. The 6.1.2.4 reference is likely an error because that is about compression. It seems like we'd reject "1foo" but accept "1foo.com", but i am not sure if this warrants an exception or reconsidering the check. (In the case that had me looking, a high TCP port was used as the hostname AND port in the Host header so it is clearly someone elses bug at the core) -- Eric Covener cove...@gmail.com