PR 41798 and many related ones (eg 39746, 38980 - both of which I've
closed today) show a history of incorrect URL-unescaping in mod_proxy.

For PR41798, the attached patch looks like a fix: it just uses
r->unparsed_uri (escaped) instead of r->uri (unescaped) in
proxy_trans.  I'm wondering if using unparsed_uri here risks
breaking something or has security implications we need to
consider, bearing in mind we already unescaped it and thus
verified it is well-formed.

Any thoughts?  Whoever wrote the comment about the existing logic
breaking RFC1945 presumably didn't see it as being that simple.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/
Index: modules/proxy/mod_proxy.c
===================================================================
--- modules/proxy/mod_proxy.c	(revision 573827)
+++ modules/proxy/mod_proxy.c	(working copy)
@@ -523,6 +523,16 @@
     const char *real;
     ap_regmatch_t regm[AP_MAX_REG_MATCH];
     char *found = NULL;
+    const char *local_uri = r->unparsed_uri;
+    /* r->uri has been unescaped at this point.
+     * Using it in a proxy breaks us for backends that expect
+     * escaped sequences - at least slashes.  We can fix that
+     * using r->unparsed_uri.  But does that itself break something
+     * else, or have security implications?
+     *
+     * We're too early to use per-dir config.  Should this be
+     * a per-vhost config option, or universal, or ???
+     */
 
     if (r->proxyreq) {
         /* someone has already set up the proxy, it was possibly ourselves
@@ -531,11 +541,6 @@
         return OK;
     }
 
-    /* XXX: since r->uri has been manipulated already we're not really
-     * compliant with RFC1945 at this point.  But this probably isn't
-     * an issue because this is a hybrid proxy/origin server.
-     */
-
     for (i = 0; i < conf->aliases->nelts; i++) {
         if (dconf->interpolate_env == 1) {
             fake = proxy_interpolate(r, ent[i].fake);
@@ -546,11 +551,11 @@
             real = ent[i].real;
         }
         if (ent[i].regex) {
-            if (!ap_regexec(ent[i].regex, r->uri, AP_MAX_REG_MATCH, regm, 0)) {
+            if (!ap_regexec(ent[i].regex, local_uri, AP_MAX_REG_MATCH, regm, 0)) {
                 if ((real[0] == '!') && (real[1] == '\0')) {
                     return DECLINED;
                 }
-                found = ap_pregsub(r->pool, real, r->uri, AP_MAX_REG_MATCH,
+                found = ap_pregsub(r->pool, real, local_uri, AP_MAX_REG_MATCH,
                                    regm);
                 /* Note: The strcmp() below catches cases where there
                  * was no regex substitution. This is so cases like:
@@ -569,13 +574,13 @@
                     found = apr_pstrcat(r->pool, "proxy:", found, NULL);
                 }
                 else {
-                    found = apr_pstrcat(r->pool, "proxy:", real, r->uri,
+                    found = apr_pstrcat(r->pool, "proxy:", real, local_uri,
                                         NULL);
                 }
             }
         }
         else {
-            len = alias_match(r->uri, fake);
+            len = alias_match(local_uri, fake);
 
             if (len != 0) {
                 if ((real[0] == '!') && (real[1] == '\0')) {
@@ -583,11 +588,12 @@
                 }
 
                 found = apr_pstrcat(r->pool, "proxy:", real,
-                                    r->uri + len, NULL);
+                                    local_uri + len, NULL);
 
             }
         }
         if (found) {
+	    r->uri = local_uri;  /* the backend should get it escaped */
             r->filename = found;
             r->handler = "proxy-server";
             r->proxyreq = PROXYREQ_REVERSE;

Reply via email to