> > Are people actually using such constructs? According to RFC 2396 (and
> > 1738), neither the scheme nor the hostname is allowed to contain escaped
> > characters:
> >
> > RFC 2396, Appendix A:
> > |
> > |     scheme        = alpha *( alpha | digit | "+" | "-" | "." )
> > |
> > |     [...]
> > |
> > |     host          = hostname | IPv4address
> > |     hostname      = *( domainlabel "." ) toplabel [ "." ]
> > |     domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
> > |     toplabel      = alpha | alpha *( alphanum | "-" ) alphanum
> > |     IPv4address   = 1*digit "." 1*digit "." 1*digit "." 1*digit

Ah, yes, but he missed the curve ball:

      absoluteURI   = scheme ":" ( hier_part | opaque_part )
      hier_part     = ( net_path | abs_path ) [ "?" query ]
      net_path      = "//" authority [ abs_path ]
      authority     = server | reg_name
      reg_name      = 1*( unreserved | escaped | "$" | "," |
                          ";" | ":" | "@" | "&" | "=" | "+" )

The authority component is what comes between the double-slash and
the slash, and it may contain escapes.  In any case, since % is not
a valid character in a hostname, it is safer to treat it as an escape
then to not treat it as such and hope nobody else does.  But the BNF
for hostname will be modified in the next revision of the spec, so
we might as well handle it now.

> i asked Roy Fielding about this before i did the change -- and he
> indicated that it was correct to unescape.  i'm not sure anyone is
> presently doing it, but apparently for DNS I18N this type of escaping is
> expected.  (as such it'd be nice for apache to do the right thing.)
> 
> i admit, rfc2396 doesn't allow hostname escaping.

I expect that there will be hundreds of pieces of software that will
puke on i18n hostnames.  It's a lame idea to begin with, but some people
just have that i18n bug so far up their ass that no amount of reason
will set them straight.

It think we should do the unescape, but more so because it might lead
to a security problem than because I think i18n dns will be deployed.

OTOH, I'd be just as happy to simply reject with 403 any hostname
that contains a %.

....Roy

Reply via email to