Hi,

can you please check the attached patch? I think it should be closer to the proper way how to fix this than the previous one.

It works properly for me even for URLs with '*' and '?'.

Regards,
Jan Kaluza

On 03/21/2014 03:27 PM, Yann Ylavic wrote:
Yes, selecting the right worker is all in ap_proxy_get_worker(), but
probably also add_pass() and proxysection() would need something like
ap_proxy_define_wildcard_worker() to register this kind of worker
(save the original name, ...).

On Fri, Mar 21, 2014 at 3:15 PM, Jim Jagielski <j...@jagunet.com> wrote:
Just a thought, but wouldn't the better place to "fix" this
be in ap_proxy_get_worker()??

On Mar 21, 2014, at 9:13 AM, Yann Ylavic <ylavic....@gmail.com> wrote:

oups, sorry for the numbering.

On Fri, Mar 21, 2014 at 2:11 PM, Yann Ylavic <ylavic....@gmail.com> wrote:
On Wed, Mar 19, 2014 at 9:59 AM, Jan Kaluža <jkal...@redhat.com> wrote:
On 03/18/2014 02:46 PM, Yann Ylavic wrote:

On Tue, Mar 18, 2014 at 2:38 PM, Yann Ylavic <ylavic....@gmail.com> wrote:

Wouldn't it be possible to define wildcard workers when the URL is
known to be a regexp substitution?
For these workers' URLs, the dollars (plus the following digit) could
be replaced by a wildcard (ie. *) and ap_proxy_get_worker() could then
use ap_strcasecmp_match() against the requested URL.


I meant ap_strcmp_match(), this has to be case sensitive...


I've implemented your idea. Can you check the attached patch please? It
fixes the original PR and also ProxyPassMatch with UDS for me.

If it's OK, I will commit it.


I took a deeper look into this but I'm afraid it is a more complex
story actually...

1. Workers can be defined in <Proxy[Match] ...> sections too.
The same should probably be done in proxysection().

2. Both '*' and '?' are legitimate URL chars.
We need a way to escape the original ones in the configured worker's
URL, but ap_strcmp_match doesn't handle (un)escaping.
So maybe new ap_matchexp_[un]escape() and ap_str[case]cmp_ematch()
functions should be implemented.

5. When no $ is used in the worker's URL, an implicit $1 is appended
by proxy_trans().
Hence
   ProxyPassMatch /some/path/with(/capture) http://some.host
is equivalent to
   ProxyPassMatch /some/path/with(/capture) http://some.host$1
So an implicit '*' should be appended to the worker's wildcard URL in
the first case (maybe only when the match has a capture but this is
just an optimization, * would match empty).

3. I think we should mark the worker as wildcard when it is defined by
ProxyPassMatch or a <ProxyMatch> section.
So that ap_proxy_get_worker() won't use the costly ap_strcmp_match()
for "normal" workers, and won't either trigger false positives due to
'*' or '?' in the configured URL (which is not escaped).

4. What about the same URL used both with ProxyPass and
ProxyPassMatch, same worker or different ones?
IMHO, they should be different workers, and by rewritting the URL to a
wildcard one as done in the patch they will be, but we lose the
configured name, and eg. the balancer-manager's param "w" would now
work only with the rewritten name (which is, at least, non intuitive).
A new balancer-manager's param could be added to access wildcard
workers (eg. "ww"), since they are different they may be accessed
separately, hence I think we need to save the original URL for such
case(s).
So maybe for point 2.'s mark we could use the configured (original)
URL as a new member of the proxy_worker struct (I don't see any reason
for it to be shared in proxy_worker_shared but one may object...).
This new field would be NULL for "normal" workers.

5. What about wildcard balancers (ProxyPassMatch /foo(/.*)
balancer://mycluster/bar$1)?
Since the balancers are registered/selected solely by their names
(path and everything after is ignored), and then their workers are
selected according to the balancer's method (the path still does not
matter), I think there is no matching issue here (the requested path,
eventually rewritten by proxy_trans(), will finally be appended to the
worker's URL as is).
Hence the same balancer's URL used both in ProxyPass and
ProxyPassMatch can refer to the same balancer (IMHO).
Nothing to be done therefore, but I may be missing something...

6. ProxyPassReverseMatch would be welcome too, but that's probably
another story...

Regards,
Yann.



Index: modules/proxy/mod_proxy.c
===================================================================
--- modules/proxy/mod_proxy.c	(revision 1585929)
+++ modules/proxy/mod_proxy.c	(working copy)
@@ -1631,10 +1631,23 @@
         new->balancer = balancer;
     }
     else {
-        proxy_worker *worker = ap_proxy_get_worker(cmd->temp_pool, NULL, conf, de_socketfy(cmd->pool, r));
+        proxy_worker *worker;
         int reuse = 0;
+        char *r_orig = r;
+        if (use_regex) {
+            r = ap_proxy_escape_wildcard_url(cmd->temp_pool, r);
+        }
+        worker = ap_proxy_get_worker(cmd->temp_pool, NULL, conf, r);
         if (!worker) {
-            const char *err = ap_proxy_define_worker(cmd->pool, &worker, NULL, conf, r, 0);
+            const char *err = NULL;
+            if (use_regex) {
+                err = ap_proxy_define_wildcard_worker(cmd->pool, &worker,
+                                                      NULL, conf, r_orig, 0);
+            }
+            else {
+                err = ap_proxy_define_worker(cmd->pool, &worker, NULL, conf,
+                                             r_orig, 0);
+            }
             if (err)
                 return apr_pstrcat(cmd->temp_pool, "ProxyPass ", err, NULL);
 
@@ -2254,6 +2267,7 @@
     char *word, *val;
     proxy_balancer *balancer = NULL;
     proxy_worker *worker = NULL;
+    int use_regex = 0;
 
     const char *err = ap_check_cmd_context(cmd, NOT_IN_DIR_LOC_FILE);
     proxy_server_conf *sconf =
@@ -2292,6 +2306,7 @@
         if (!r) {
             return "Regex could not be compiled";
         }
+        use_regex = 1;
     }
 
     /* initialize our config and fetch it */
@@ -2335,11 +2350,22 @@
             }
         }
         else {
+            char *escaped_p = de_socketfy(cmd->temp_pool, (char*)conf->p);
+            if (use_regex) {
+                escaped_p = ap_proxy_escape_wildcard_url(cmd->temp_pool, escaped_p);
+            }
             worker = ap_proxy_get_worker(cmd->temp_pool, NULL, sconf,
-                                         de_socketfy(cmd->temp_pool, (char*)conf->p));
+                                         escaped_p);
             if (!worker) {
-                err = ap_proxy_define_worker(cmd->pool, &worker, NULL,
-                                          sconf, conf->p, 0);
+                if (use_regex) {
+                    err = ap_proxy_define_wildcard_worker(cmd->pool, &worker,
+                                                          NULL, sconf,
+                                                          conf->p, 0);
+                }
+                else {
+                    err = ap_proxy_define_worker(cmd->pool, &worker, NULL,
+                                                 sconf, conf->p, 0);
+                }
                 if (err)
                     return apr_pstrcat(cmd->temp_pool, thiscmd->name,
                                        " ", err, NULL);
Index: modules/proxy/mod_proxy.h
===================================================================
--- modules/proxy/mod_proxy.h	(revision 1585929)
+++ modules/proxy/mod_proxy.h	(working copy)
@@ -352,6 +352,7 @@
     char      redirect[PROXY_WORKER_MAX_ROUTE_SIZE];  /* temporary balancing redirection route */
     char      flusher[PROXY_WORKER_MAX_SCHEME_SIZE];  /* flush provider used by mod_proxy_fdpass */
     char      uds_path[PROXY_WORKER_MAX_NAME_SIZE];   /* path to worker's unix domain socket if applicable */
+    char      wildcard_name[PROXY_WORKER_MAX_NAME_SIZE]; /* name for regexp matching */
     int             lbset;      /* load balancer cluster set */
     int             retries;    /* number of retries on this worker */
     int             lbstatus;   /* Current lbstatus */
@@ -636,6 +637,34 @@
                                              int do_malloc);
 
 /**
+ * Escapes the URL containing regexp substitutions ($N) to be used as
+ * wildcard_name in proxy_worker structure.
+ * @param p         memory pool to allocate the string from
+ * @param url       url containing worker name
+ * @return          escaped url
+ */
+PROXY_DECLARE(char *) ap_proxy_escape_wildcard_url(apr_pool_t *p,
+                                                   const char *url);
+
+/**
+ * Define and Allocate space for the worker with the name using regexp
+ * to proxy configuration.
+ * @param p         memory pool to allocate worker from
+ * @param worker    the new worker
+ * @param balancer  the balancer that the worker belongs to
+ * @param conf      current proxy server configuration
+ * @param url       url containing worker name
+ * @param do_malloc true if shared struct should be malloced
+ * @return          error message or NULL if successful (*worker is new worker)
+ */
+PROXY_DECLARE(char *) ap_proxy_define_wildcard_worker(apr_pool_t *p,
+                                             proxy_worker **worker,
+                                             proxy_balancer *balancer,
+                                             proxy_server_conf *conf,
+                                             const char *url,
+                                             int do_malloc);
+
+/**
  * Share a defined proxy worker via shm
  * @param worker  worker to be shared
  * @param shm     location of shared info
Index: modules/proxy/proxy_util.c
===================================================================
--- modules/proxy/proxy_util.c	(revision 1585929)
+++ modules/proxy/proxy_util.c	(working copy)
@@ -1508,6 +1508,36 @@
     return apr_pstrcat(p, "unix:", worker->s->uds_path, "|", worker->s->name, NULL);
 }
 
+static int ap_proxy_wildcard_match(const char *str, const char *expected)
+{
+    int x, y;
+
+    for (x = 0, y = 0; expected[y]; ++y, ++x) {
+        if ((!str[x]) && (expected[y] != '*'))
+            return -1;
+        if (expected[y] == '*') {
+            while (expected[++y] == '*');
+            if (!expected[y])
+                return 0;
+            while (str[x]) {
+                int ret;
+                if ((ret = ap_strcmp_match(&str[x++], &expected[y])) != 1)
+                    return ret;
+            }
+            return -1;
+        }
+        else if ((expected[y] != '?') && (str[x] != expected[y])) {
+            if ((str[x] == '*' && strncasecmp("%2A", expected + y, 3) == 0) ||
+                (str[x] == '?' && strncasecmp("%3F", expected + y, 3) == 0)) {
+                y += 2;
+                continue;
+            }
+            return 1;
+        }
+    }
+    return (str[x] != '\0');
+}
+
 PROXY_DECLARE(proxy_worker *) ap_proxy_get_worker(apr_pool_t *p,
                                                   proxy_balancer *balancer,
                                                   proxy_server_conf *conf,
@@ -1568,7 +1598,8 @@
             if ( ((worker_name_length = strlen(worker->s->name)) <= url_length)
                 && (worker_name_length >= min_match)
                 && (worker_name_length > max_match)
-                && (strncmp(url_copy, worker->s->name, worker_name_length) == 0) ) {
+                && ((worker->s->wildcard_name && ap_proxy_wildcard_match(url_copy, worker->s->wildcard_name) == 0)
+                || (strncmp(url_copy, worker->s->name, worker_name_length) == 0)) ) {
                 max_worker = worker;
                 max_match = worker_name_length;
             }
@@ -1580,7 +1611,8 @@
             if ( ((worker_name_length = strlen(worker->s->name)) <= url_length)
                 && (worker_name_length >= min_match)
                 && (worker_name_length > max_match)
-                && (strncmp(url_copy, worker->s->name, worker_name_length) == 0) ) {
+                && ((worker->s->wildcard_name && ap_proxy_wildcard_match(url_copy, worker->s->wildcard_name) == 0)
+                || (strncmp(url_copy, worker->s->name, worker_name_length) == 0)) ) {
                 max_worker = worker;
                 max_match = worker_name_length;
             }
@@ -1721,6 +1753,96 @@
     return NULL;
 }
 
+PROXY_DECLARE(char *) ap_proxy_escape_wildcard_url(apr_pool_t *p,
+                                                   const char *url)
+{
+    char *r, *x;
+    const char *y, *z;
+    apr_size_t length, escapes = 0;
+
+    y = ap_strstr_c(url, "://");
+    if (y == NULL) {
+        return NULL;
+    }
+
+    /* Compute how many characters need to be escaped */
+    for (z = y; *z; ++z) {
+        if (*z == '*' || *z == '?')
+            escapes++;
+    }
+
+    /* Compute the length of the input string, including NULL */
+    length = z - url + 1;
+
+    /* Each escaped character needs 3 extra bytes (? --> %3F) */
+    r = apr_palloc(p, length + 3 * escapes);
+
+    /* When the scheme or the hostname contains '$', skip.
+    * When the path contains '$N', replace '$N' with wildcard. */
+    escapes = 0;
+    strncpy(r, url, y - url);
+    x = r + (y - url);
+    for (; *y; ++x, ++y) {
+        if (*y != '$') {
+            /* Escape '*' and '?' in URL, so we can distinguish between
+            * the original characters with the ones we added later. */
+            if (*y == '*') {
+                *x++ = '%';
+                *x++ = '2';
+                *x = 'A';
+            }
+            else if (*y == '?') {
+                *x++ = '%';
+                *x++ = '3';
+                *x = 'F';
+            }
+            else {
+                *x = *y;
+            }
+        }
+        else if (apr_isdigit(*(y + 1))) {
+            *x = '*';
+            escapes++;
+            y += 1;
+        }
+    }
+
+    /* When no $ is used in the worker's URL, an implicit $1 is appended
+     * by proxy_trans(). So append '*' to stay consistent with proxy_trans. */
+    if (escapes == 0) {
+        *x++ = '*';
+    }
+
+    *x = '\0';
+    return r;
+}
+
+PROXY_DECLARE(char *) ap_proxy_define_wildcard_worker(apr_pool_t *p,
+                                             proxy_worker **worker,
+                                             proxy_balancer *balancer,
+                                             proxy_server_conf *conf,
+                                             const char *url,
+                                             int do_malloc)
+{
+    char *err;
+    char *escaped_url;
+
+    err = ap_proxy_define_worker(p, worker, balancer, conf, url, do_malloc);
+    if (err) {
+        return err;
+    }
+
+    escaped_url = ap_proxy_escape_wildcard_url(p, (*worker)->s->name);
+    if (PROXY_STRNCPY((*worker)->s->wildcard_name, escaped_url)
+        != APR_SUCCESS) {
+        return apr_psprintf(p, "worker escaped URL (%s) too long",
+                            escaped_url);
+    }
+    strcpy((*worker)->s->wildcard_name, escaped_url);
+
+    return NULL;
+}
+
 /*
  * Create an already defined worker and free up memory
  */

Reply via email to