PR 41798 and many related ones (eg 39746, 38980 - both of which I've closed today) show a history of incorrect URL-unescaping in mod_proxy.
For PR41798, the attached patch looks like a fix: it just uses r->unparsed_uri (escaped) instead of r->uri (unescaped) in proxy_trans. I'm wondering if using unparsed_uri here risks breaking something or has security implications we need to consider, bearing in mind we already unescaped it and thus verified it is well-formed. Any thoughts? Whoever wrote the comment about the existing logic breaking RFC1945 presumably didn't see it as being that simple. -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/
Index: modules/proxy/mod_proxy.c =================================================================== --- modules/proxy/mod_proxy.c (revision 573827) +++ modules/proxy/mod_proxy.c (working copy) @@ -523,6 +523,16 @@ const char *real; ap_regmatch_t regm[AP_MAX_REG_MATCH]; char *found = NULL; + const char *local_uri = r->unparsed_uri; + /* r->uri has been unescaped at this point. + * Using it in a proxy breaks us for backends that expect + * escaped sequences - at least slashes. We can fix that + * using r->unparsed_uri. But does that itself break something + * else, or have security implications? + * + * We're too early to use per-dir config. Should this be + * a per-vhost config option, or universal, or ??? + */ if (r->proxyreq) { /* someone has already set up the proxy, it was possibly ourselves @@ -531,11 +541,6 @@ return OK; } - /* XXX: since r->uri has been manipulated already we're not really - * compliant with RFC1945 at this point. But this probably isn't - * an issue because this is a hybrid proxy/origin server. - */ - for (i = 0; i < conf->aliases->nelts; i++) { if (dconf->interpolate_env == 1) { fake = proxy_interpolate(r, ent[i].fake); @@ -546,11 +551,11 @@ real = ent[i].real; } if (ent[i].regex) { - if (!ap_regexec(ent[i].regex, r->uri, AP_MAX_REG_MATCH, regm, 0)) { + if (!ap_regexec(ent[i].regex, local_uri, AP_MAX_REG_MATCH, regm, 0)) { if ((real[0] == '!') && (real[1] == '\0')) { return DECLINED; } - found = ap_pregsub(r->pool, real, r->uri, AP_MAX_REG_MATCH, + found = ap_pregsub(r->pool, real, local_uri, AP_MAX_REG_MATCH, regm); /* Note: The strcmp() below catches cases where there * was no regex substitution. This is so cases like: @@ -569,13 +574,13 @@ found = apr_pstrcat(r->pool, "proxy:", found, NULL); } else { - found = apr_pstrcat(r->pool, "proxy:", real, r->uri, + found = apr_pstrcat(r->pool, "proxy:", real, local_uri, NULL); } } } else { - len = alias_match(r->uri, fake); + len = alias_match(local_uri, fake); if (len != 0) { if ((real[0] == '!') && (real[1] == '\0')) { @@ -583,11 +588,12 @@ } found = apr_pstrcat(r->pool, "proxy:", real, - r->uri + len, NULL); + local_uri + len, NULL); } } if (found) { + r->uri = local_uri; /* the backend should get it escaped */ r->filename = found; r->handler = "proxy-server"; r->proxyreq = PROXYREQ_REVERSE;