Re: [PATCH] Fix settings options with ProxyPassMatch
On 07/10/2014 03:57 PM, Yann Ylavic wrote: On Thu, Jul 10, 2014 at 9:12 AM, Jan Kaluža wrote: On 07/09/2014 04:26 PM, Yann Ylavic wrote: I forgot proxysection(), why not handle the ap_proxy_define_match_worker() case there too? I'm not sure I see what you mean. There's match_worker created in proxysection(), or do you mean something else? Indeed I missed the code in proxysection()... That's +1 for me. Committed in r1609680. Jan Kaluza
Re: [PATCH] Fix settings options with ProxyPassMatch
On Thu, Jul 10, 2014 at 9:12 AM, Jan Kaluža wrote: > On 07/09/2014 04:26 PM, Yann Ylavic wrote: >> I forgot proxysection(), why not handle the >> ap_proxy_define_match_worker() case there too? >> > > I'm not sure I see what you mean. There's match_worker created in > proxysection(), or do you mean something else? Indeed I missed the code in proxysection()... That's +1 for me.
Re: [PATCH] Fix settings options with ProxyPassMatch
On 07/09/2014 04:26 PM, Yann Ylavic wrote: On Wed, Jul 9, 2014 at 4:14 PM, Yann Ylavic wrote: On Wed, Jul 9, 2014 at 3:03 PM, Jan Kaluža wrote: Hi, could you please check the patch I've attached to this email? Looks good to me. It changes following parts of Yann's patch: 1. keep only single name of the worker stored in shared memory. 2. when ProxyPassMatch is used, wshared->is_name_matchable = 1. 3. if is_name_matchable == 1, ap_proxy_get_worker() uses ap_proxy_strcmp_ematch() which treats "$N" as '*'. Much simpler, no need to store the pattern. Would it be possible to handle some escaping character (eg. \), so that $ can be expressed without being interpolated? (There is still the comment in ap_proxy_strcmp_ematch(), but not the code anymore). AFAICT, $ is a legitimate URL character that need not be %-escaped. Done in attached patch. I forgot proxysection(), why not handle the ap_proxy_define_match_worker() case there too? I'm not sure I see what you mean. There's match_worker created in proxysection(), or do you mean something else? Jan Kaluza httpd-trunk-proxy-matchv3.patch Description: application/download
Re: [PATCH] Fix settings options with ProxyPassMatch
On Wed, Jul 9, 2014 at 4:14 PM, Yann Ylavic wrote: > On Wed, Jul 9, 2014 at 3:03 PM, Jan Kaluža wrote: >> Hi, >> >> could you please check the patch I've attached to this email? > > Looks good to me. > >> >> It changes following parts of Yann's patch: >> >> 1. keep only single name of the worker stored in shared memory. >> >> 2. when ProxyPassMatch is used, wshared->is_name_matchable = 1. >> >> 3. if is_name_matchable == 1, ap_proxy_get_worker() uses >> ap_proxy_strcmp_ematch() which treats "$N" as '*'. > > Much simpler, no need to store the pattern. > > Would it be possible to handle some escaping character (eg. \), so > that $ can be expressed without being interpolated? > (There is still the comment in ap_proxy_strcmp_ematch(), but not the > code anymore). > AFAICT, $ is a legitimate URL character that need not be %-escaped. I forgot proxysection(), why not handle the ap_proxy_define_match_worker() case there too?
Re: [PATCH] Fix settings options with ProxyPassMatch
On Wed, Jul 9, 2014 at 3:03 PM, Jan Kaluža wrote: > Hi, > > could you please check the patch I've attached to this email? Looks good to me. > > It changes following parts of Yann's patch: > > 1. keep only single name of the worker stored in shared memory. > > 2. when ProxyPassMatch is used, wshared->is_name_matchable = 1. > > 3. if is_name_matchable == 1, ap_proxy_get_worker() uses > ap_proxy_strcmp_ematch() which treats "$N" as '*'. Much simpler, no need to store the pattern. Would it be possible to handle some escaping character (eg. \), so that $ can be expressed without being interpolated? (There is still the comment in ap_proxy_strcmp_ematch(), but not the code anymore). AFAICT, $ is a legitimate URL character that need not be %-escaped. Regards, Yann.
Re: [PATCH] Fix settings options with ProxyPassMatch
On 04/29/2014 03:51 PM, Jim Jagielski wrote: On Apr 29, 2014, at 8:41 AM, Jan Kaluža wrote: Because later we have to match the URL of request with some proxy_worker. If you configure ProxyPassMatch like this: ProxyPassMatch ^/test/(\d+)/foo.jpg http://x/$1/foo.jpg Then the proxy_worker name would be "http://x/$1/foo.jpg";. If you receive request with URL "http://x/something/foo.jpg";, ap_proxy_get_worker() will have to find out the worker with name "http://x/$1/foo.jpg";. The question here is how it would do that? The answer used in the patch is "we change the worker name to http://x/*/foo.jpg"; and check if the URL ("http://x/something/foo.jpg"; in our case) matches that worker. If we store the original name with $N, we will have to find out different way how to match the worker (probably emulating wildcard pattern matching) It would be possible to store only the original name (with "$N" variables), store the flag that the proxy worker is using regex and change ap_proxy_strcmp_ematch() function to treat "$N" as "*", but I don't see any real advantage here. In Yann's suggested patch we don't store match_name where it belongs; so we'd need to put it in shm, which means more memory. Instead, we store as is and add a simple char flag which sez if the stored name is a regex. Much savings. Hi, could you please check the patch I've attached to this email? It changes following parts of Yann's patch: 1. keep only single name of the worker stored in shared memory. 2. when ProxyPassMatch is used, wshared->is_name_matchable = 1. 3. if is_name_matchable == 1, ap_proxy_get_worker() uses ap_proxy_strcmp_ematch() which treats "$N" as '*'. I've tested this patch a bit and it seems to work for me. Regards, Jan Kaluza And I have no idea why storing with $1 -> * somehow makes things easier or implies a "different way how to match the worker". Finally, let's think about this deeper... Assume we do have ProxyPassMatch ^/test/(\d+)/foo.jpg http://x/$1/foo.jpg ProxyPassMatch ^/zippy/(\d+)/bar.jpg http://x/$1/omar/propjoe.gif is the intent/desire to have 2 workers or 1? A worker is, in some ways, simply a nickname for the socket related to a host and port. Maybe, in the interests of efficiency and speed, since regexes are slow as it is, a condition could be specified (a limitation, as it were), that when using PPM, only everything up to the 1st potential substitution is considered a unique worker. httpd-trunk-proxy-matchv2.patch Description: application/download
Re: [PATCH] Fix settings options with ProxyPassMatch
On Wed, Apr 30, 2014 at 12:53 AM, Yann Ylavic wrote: > Still another solution for these workers would be to reuse the > ap_regmatch_t vector from proxy_trans() to exact match the worker's > name (with its zero or more $N replaced with strings offsets from > vector[N], like ap_expr_str_exec_re() does). This solution is complete nonsense, I should have though about it more deeply :-/ There is no such thing as an exact match with a wildcard name. The place where the regexp (worker's alias) is exectuted and the resulting ap_regmatch_t vector is used to build the requested URL is proxy_trans(). Why would mod_proxy have to find (again) the worker later? I guess things may have been rewritten in between, or the like, but it looks like a duplicate search for some (marginal/too simple?) cases.
Re: [PATCH] Fix settings options with ProxyPassMatch
On Tue, Apr 29, 2014 at 3:51 PM, Jim Jagielski wrote: > On Apr 29, 2014, at 8:41 AM, Jan Kaluža wrote: >> >> Because later we have to match the URL of request with some proxy_worker. >> >> If you configure ProxyPassMatch like this: >> ProxyPassMatch ^/test/(\d+)/foo.jpg http://x/$1/foo.jpg >> >> Then the proxy_worker name would be "http://x/$1/foo.jpg";. >> >> If you receive request with URL "http://x/something/foo.jpg";, >> ap_proxy_get_worker() will have to find out the worker with name >> "http://x/$1/foo.jpg";. The question here is how it would do that? >> >> The answer used in the patch is "we change the worker name to >> http://x/*/foo.jpg"; and check if the URL ("http://x/something/foo.jpg"; in >> our case) matches that worker. >> >> If we store the original name with $N, we will have to find out different >> way how to match the worker (probably emulating wildcard pattern matching) >> >> It would be possible to store only the original name (with "$N" variables), >> store the flag that the proxy worker is using regex and change >> ap_proxy_strcmp_ematch() function to treat "$N" as "*", but I don't see any >> real advantage here. >> > > In Yann's suggested patch we don't store match_name where it > belongs; so we'd need to put it in shm, which means more > memory. Agreed, plus this is not balancer-manager aware. BTW, what's the difference between alias_match() used by proxy_trans() and ap_proxy_get_worker()? Longest match? Can an entry matched by proxy_trans() *not* belong to the worker got(ten) later from ap_proxy_get_worker()? If no, another solution would be to backref the worker in (all) its struct proxy_alias(es) entries. That way the worker would be already known at proxy_trans() time (when the entry is matched), and a new ap_proxy_get_worker_for_request(r) could do the association later. AFAICT, we don't use ap_proxy_get_worker() at runtime without a request_rec available. At least that could work for the *Match workers, for which the only relevent requested-URL's match is from proxy_trans(), imo. Still another solution for these workers would be to reuse the ap_regmatch_t vector from proxy_trans() to exact match the worker's name (with its zero or more $N replaced with strings offsets from vector[N], like ap_expr_str_exec_re() does). That would also require a request_rec available at ap_proxy_get_worker()'s (run)time though. > Instead, we store as is and add a simple char flag > which sez if the stored name is a regex. Much savings. > > And I have no idea why storing with $1 -> * somehow makes > things easier or implies a "different way how to match the worker". Do we need to provide a way to escape (application/legitimate) $N in the worker name or simply document on the limitation? In the latter case this is indeed much simpler. > > Finally, let's think about this deeper... > > Assume we do have > > ProxyPassMatch ^/test/(\d+)/foo.jpg http://x/$1/foo.jpg > ProxyPassMatch ^/zippy/(\d+)/bar.jpg http://x/$1/omar/propjoe.gif > > is the intent/desire to have 2 workers or 1? A worker is, in > some ways, simply a nickname for the socket related to a host and port. For which connections can be reused, different parameters apply... > Maybe, in the interests of efficiency and speed, since regexes > are slow as it is, a condition could be specified (a limitation, > as it were), that when using PPM, only everything up to > the 1st potential substitution is considered a unique worker. That could be (another) limitation. But one may want to apply different parameters to these somehow different URLs, since they may be different backends/applications too.
Re: [PATCH] Fix settings options with ProxyPassMatch
On Apr 29, 2014, at 8:41 AM, Jan Kaluža wrote: > > Because later we have to match the URL of request with some proxy_worker. > > If you configure ProxyPassMatch like this: > ProxyPassMatch ^/test/(\d+)/foo.jpg http://x/$1/foo.jpg > > Then the proxy_worker name would be "http://x/$1/foo.jpg";. > > If you receive request with URL "http://x/something/foo.jpg";, > ap_proxy_get_worker() will have to find out the worker with name > "http://x/$1/foo.jpg";. The question here is how it would do that? > > The answer used in the patch is "we change the worker name to > http://x/*/foo.jpg"; and check if the URL ("http://x/something/foo.jpg"; in our > case) matches that worker. > > If we store the original name with $N, we will have to find out different way > how to match the worker (probably emulating wildcard pattern matching) > > It would be possible to store only the original name (with "$N" variables), > store the flag that the proxy worker is using regex and change > ap_proxy_strcmp_ematch() function to treat "$N" as "*", but I don't see any > real advantage here. > In Yann's suggested patch we don't store match_name where it belongs; so we'd need to put it in shm, which means more memory. Instead, we store as is and add a simple char flag which sez if the stored name is a regex. Much savings. And I have no idea why storing with $1 -> * somehow makes things easier or implies a "different way how to match the worker". Finally, let's think about this deeper... Assume we do have ProxyPassMatch ^/test/(\d+)/foo.jpg http://x/$1/foo.jpg ProxyPassMatch ^/zippy/(\d+)/bar.jpg http://x/$1/omar/propjoe.gif is the intent/desire to have 2 workers or 1? A worker is, in some ways, simply a nickname for the socket related to a host and port. Maybe, in the interests of efficiency and speed, since regexes are slow as it is, a condition could be specified (a limitation, as it were), that when using PPM, only everything up to the 1st potential substitution is considered a unique worker.
Re: [PATCH] Fix settings options with ProxyPassMatch
On 04/29/2014 03:29 PM, Jim Jagielski wrote: On Apr 29, 2014, at 8:41 AM, Jan Kaluža wrote: Because later we have to match the URL of request with some proxy_worker. If you configure ProxyPassMatch like this: ProxyPassMatch ^/test/(\d+)/foo.jpg http://x/$1/foo.jpg Then the proxy_worker name would be "http://x/$1/foo.jpg";. If you receive request with URL "http://x/something/foo.jpg";, ap_proxy_get_worker() will have to find out the worker with name "http://x/$1/foo.jpg";. The question here is how it would do that? The answer used in the patch is "we change the worker name to http://x/*/foo.jpg"; and check if the URL ("http://x/something/foo.jpg"; in our case) matches that worker. If we store the original name with $N, we will have to find out different way how to match the worker (probably emulating wildcard pattern matching) It would be possible to store only the original name (with "$N" variables), store the flag that the proxy worker is using regex and change ap_proxy_strcmp_ematch() function to treat "$N" as "*", but I don't see any real advantage here. Huh??? I thought we were talking about this simple, streamlined patch that does JUST matching/awareness of regex's, not the one which seems to fold in a BUNCH of other semi-related stuff. Sorry, I was talking about the latest Yann's patch all the time :(. The version I looked at had some 'use_regex' logic outside of ap_proxy_get_worker(), right before we call it, in fact. Maybe I'm seeing an older version of the patch? I think that's the correct patch. It has use_regex only for creating the proxy_worker to distinguish between "normal" worker and "regex" worker. Note that "use_regex" does not influence the call of "ap_proxy_get_worker()". As I said, it's used only to distinguish whether the newly created worker is "normal" worker or "regex" worker. Regards, Jan Kaluza
Re: [PATCH] Fix settings options with ProxyPassMatch
On Apr 29, 2014, at 8:41 AM, Jan Kaluža wrote: > > Because later we have to match the URL of request with some proxy_worker. > > If you configure ProxyPassMatch like this: > ProxyPassMatch ^/test/(\d+)/foo.jpg http://x/$1/foo.jpg > > Then the proxy_worker name would be "http://x/$1/foo.jpg";. > > If you receive request with URL "http://x/something/foo.jpg";, > ap_proxy_get_worker() will have to find out the worker with name > "http://x/$1/foo.jpg";. The question here is how it would do that? > > The answer used in the patch is "we change the worker name to > http://x/*/foo.jpg"; and check if the URL ("http://x/something/foo.jpg"; in our > case) matches that worker. > > If we store the original name with $N, we will have to find out different way > how to match the worker (probably emulating wildcard pattern matching) > > It would be possible to store only the original name (with "$N" variables), > store the flag that the proxy worker is using regex and change > ap_proxy_strcmp_ematch() function to treat "$N" as "*", but I don't see any > real advantage here. Huh??? I thought we were talking about this simple, streamlined patch that does JUST matching/awareness of regex's, not the one which seems to fold in a BUNCH of other semi-related stuff. httpd-trunk-proxypassmatch.patch Description: Binary data > >> The version I looked at had some 'use_regex' logic outside of >> ap_proxy_get_worker(), right before we call it, in fact. Maybe >> I'm seeing an older version of the patch? >> > > I think that's the correct patch. It has use_regex only for creating the > proxy_worker to distinguish between "normal" worker and "regex" worker. > > Note that "use_regex" does not influence the call of "ap_proxy_get_worker()". > As I said, it's used only to distinguish whether the newly created worker is > "normal" worker or "regex" worker. > > Regards, > Jan Kaluza
Re: [PATCH] Fix settings options with ProxyPassMatch
On 04/29/2014 02:22 PM, Jim Jagielski wrote: On Apr 29, 2014, at 7:41 AM, Jan Kaluža wrote: That's what we do with current patch I think, don't we? In the patch, we create "char *match_name" which is NULL when the worker_name is not regex and contains the escaped name if regex is used (with "$N" replaced by '*'). ap_proxy_get_worker() later checks the match_name and if it's not NULL, it tries regex matching using ap_proxy_strcmp_ematch(). The actual problem is that ap_proxy_get_worker() cannot find out proper proxy_worker, because this method is not regex aware. To make it regex aware, we have to distinguish regex workers during creation, right, hence the flag. replace the regex variables ($N -> *) in some safe manner and allow ap_proxy_get_worker() to find the right worker using this new info. And that's what the patch does. Why replace? Why can't we store the name as is? Because later we have to match the URL of request with some proxy_worker. If you configure ProxyPassMatch like this: ProxyPassMatch ^/test/(\d+)/foo.jpg http://x/$1/foo.jpg Then the proxy_worker name would be "http://x/$1/foo.jpg";. If you receive request with URL "http://x/something/foo.jpg";, ap_proxy_get_worker() will have to find out the worker with name "http://x/$1/foo.jpg";. The question here is how it would do that? The answer used in the patch is "we change the worker name to http://x/*/foo.jpg"; and check if the URL ("http://x/something/foo.jpg"; in our case) matches that worker. If we store the original name with $N, we will have to find out different way how to match the worker (probably emulating wildcard pattern matching) It would be possible to store only the original name (with "$N" variables), store the flag that the proxy worker is using regex and change ap_proxy_strcmp_ematch() function to treat "$N" as "*", but I don't see any real advantage here. The version I looked at had some 'use_regex' logic outside of ap_proxy_get_worker(), right before we call it, in fact. Maybe I'm seeing an older version of the patch? I think that's the correct patch. It has use_regex only for creating the proxy_worker to distinguish between "normal" worker and "regex" worker. Note that "use_regex" does not influence the call of "ap_proxy_get_worker()". As I said, it's used only to distinguish whether the newly created worker is "normal" worker or "regex" worker. Regards, Jan Kaluza
Re: [PATCH] Fix settings options with ProxyPassMatch
On Apr 29, 2014, at 7:41 AM, Jan Kaluža wrote: > > That's what we do with current patch I think, don't we? In the patch, we > create "char *match_name" which is NULL when the worker_name is not regex and > contains the escaped name if regex is used (with "$N" replaced by '*'). > > ap_proxy_get_worker() later checks the match_name and if it's not NULL, it > tries regex matching using ap_proxy_strcmp_ematch(). > > The actual problem is that ap_proxy_get_worker() cannot find out proper > proxy_worker, because this method is not regex aware. To make it regex aware, > we have to distinguish regex workers during creation, right, hence the flag. > replace the regex variables ($N -> *) in some safe manner and allow > ap_proxy_get_worker() to find the right worker using this new info. And > that's what the patch does. Why replace? Why can't we store the name as is? The version I looked at had some 'use_regex' logic outside of ap_proxy_get_worker(), right before we call it, in fact. Maybe I'm seeing an older version of the patch?
Re: [PATCH] Fix settings options with ProxyPassMatch
On 04/29/2014 01:04 PM, Jim Jagielski wrote: On Apr 24, 2014, at 8:57 PM, Yann Ylavic wrote: Hi Jan, sorry for the late. On Tue, Apr 22, 2014 at 3:39 PM, Jan Kaluža wrote: Hi again, the patch has been here for some time already. I hesitate to commit it to trunk without any review, because it changes the core code in mod_proxy and I'm afraid that there could exist more corner-cases I'm not aware of. On the other side (and to motivate someone with deeper knowledge of mod_proxy :) ), it would be great to have UDS support also for ProxyPassMatch and fix some old bugs related to setting options together with ProxyPassMatch (like PR 43513 and PR 54973). So, anyone who would have time to review this patch, please? :) I slightly modified your patch with the following changes : 1. Use \-escaping instead of %-escaping for wildcard_match so to consume less space and unescape quickly. Wouldn't this cause confusion/issue with normal escaping? 2. Fix the recursion in ap_proxy_wildcard_match(), which was calling the legacy ap_strcmp_match(), and unescape according to 1. +1 2'. Shouldn't this function be made iterative (it is not final recursive currently)? +1 3. Always try to get the worker with its de_socketfy()ed name, and only if not found use ap_proxy_define[_wildcard]_worker() with the configured URL (add_pass() and proxysection() were not consistent about ap_proxy_get_worker()'s given name/URL). Makes sense. 4. Don't match both the "normal" name with strcnmp() and the wildcard name with strcmp_match() in ap_proxy_get_worker(), since the worker is either wildcard or not (this enforcement is also added to add_pass() and proxysection() so that it is not possible to declare the same worker name once with a Match, again without, or vice versa). 4'. Does it makes sense to use the longest match_name or should we go for the first matching wildcard worker (if any) wins? Longest match name, imo. 5. Rename ap_proxy_wildcard_match() as ap_proxy_strcmp_ematch() to suggest it could go to server/util.ch (if/when the associated ap_matchexp_escape() and ap_proxy_strcasecmp_ematch() are coded). Why? 6. Move proxy_worker_shared { char wildcard_name[PROXY_WORKER_MAX_NAME_SIZE]; } to proxy_worker { char *match_name; } to save shm space and avoid length issues (I don't see how this could be updated at runtime, for now, balancer-manager can't declare Match workers). Future effort, so -1 on the moving. 6. Keep ap_proxy_escape_wildcard_url() static (private) since it is not used outside proxy_util (now). Sounds good. This is not very tested (yet), I'll come back if I detect issues, but first I'd like your feedbacks on this modifications... Patch attached. Just a comment: I am really concerned that with such a major change, we will likely break things in ways we are unaware of... We should focus on fixing the actual problem w/o also combining it with other "fixes" that are tangentially involved. For example, maybe a simply flag when creating the worker that sez "worker name is a regex" is the way to approach it, and thus ap_proxy_get_worker() could add additional checks for finding the "right" worker when it knows it needs to allow for regex substitutions. That creates a additional logic path which is self-contained and isolated, instead of monkeying around with changing current paths... That's what we do with current patch I think, don't we? In the patch, we create "char *match_name" which is NULL when the worker_name is not regex and contains the escaped name if regex is used (with "$N" replaced by '*'). ap_proxy_get_worker() later checks the match_name and if it's not NULL, it tries regex matching using ap_proxy_strcmp_ematch(). The actual problem is that ap_proxy_get_worker() cannot find out proper proxy_worker, because this method is not regex aware. To make it regex aware, we have to distinguish regex workers during creation, replace the regex variables ($N -> *) in some safe manner and allow ap_proxy_get_worker() to find the right worker using this new info. And that's what the patch does. Regards, Jan Kaluza
Re: [PATCH] Fix settings options with ProxyPassMatch
On Apr 24, 2014, at 8:57 PM, Yann Ylavic wrote: > Hi Jan, > > sorry for the late. > > On Tue, Apr 22, 2014 at 3:39 PM, Jan Kaluža wrote: >> Hi again, >> >> the patch has been here for some time already. I hesitate to commit it to >> trunk without any review, because it changes the core code in mod_proxy and >> I'm afraid that there could exist more corner-cases I'm not aware of. >> >> On the other side (and to motivate someone with deeper knowledge of >> mod_proxy :) ), it would be great to have UDS support also for >> ProxyPassMatch and fix some old bugs related to setting options together >> with ProxyPassMatch (like PR 43513 and PR 54973). >> >> So, anyone who would have time to review this patch, please? :) > > I slightly modified your patch with the following changes : > 1. Use \-escaping instead of %-escaping for wildcard_match so to > consume less space and unescape quickly. Wouldn't this cause confusion/issue with normal escaping? > 2. Fix the recursion in ap_proxy_wildcard_match(), which was calling > the legacy ap_strcmp_match(), and unescape according to 1. +1 > 2'. Shouldn't this function be made iterative (it is not final > recursive currently)? +1 > 3. Always try to get the worker with its de_socketfy()ed name, and > only if not found use ap_proxy_define[_wildcard]_worker() with the > configured URL (add_pass() and proxysection() were not consistent > about ap_proxy_get_worker()'s given name/URL). Makes sense. > 4. Don't match both the "normal" name with strcnmp() and the wildcard > name with strcmp_match() in ap_proxy_get_worker(), since the worker is > either wildcard or not (this enforcement is also added to add_pass() > and proxysection() so that it is not possible to declare the same > worker name once with a Match, again without, or vice versa). > 4'. Does it makes sense to use the longest match_name or should we go > for the first matching wildcard worker (if any) wins? Longest match name, imo. > 5. Rename ap_proxy_wildcard_match() as ap_proxy_strcmp_ematch() to > suggest it could go to server/util.ch (if/when the associated > ap_matchexp_escape() and ap_proxy_strcasecmp_ematch() are coded). Why? > 6. Move proxy_worker_shared { char > wildcard_name[PROXY_WORKER_MAX_NAME_SIZE]; } to proxy_worker { char > *match_name; } to save shm space and avoid length issues (I don't see > how this could be updated at runtime, for now, balancer-manager can't > declare Match workers). Future effort, so -1 on the moving. > 6. Keep ap_proxy_escape_wildcard_url() static (private) since it is > not used outside proxy_util (now). Sounds good. > > This is not very tested (yet), I'll come back if I detect issues, but > first I'd like your feedbacks on this modifications... > > Patch attached. > Just a comment: I am really concerned that with such a major change, we will likely break things in ways we are unaware of... We should focus on fixing the actual problem w/o also combining it with other "fixes" that are tangentially involved. For example, maybe a simply flag when creating the worker that sez "worker name is a regex" is the way to approach it, and thus ap_proxy_get_worker() could add additional checks for finding the "right" worker when it knows it needs to allow for regex substitutions. That creates a additional logic path which is self-contained and isolated, instead of monkeying around with changing current paths...
Re: [PATCH] Fix settings options with ProxyPassMatch
On 04/25/2014 02:57 AM, Yann Ylavic wrote: Hi Jan, sorry for the late. No problem :). On Tue, Apr 22, 2014 at 3:39 PM, Jan Kaluža wrote: Hi again, the patch has been here for some time already. I hesitate to commit it to trunk without any review, because it changes the core code in mod_proxy and I'm afraid that there could exist more corner-cases I'm not aware of. On the other side (and to motivate someone with deeper knowledge of mod_proxy :) ), it would be great to have UDS support also for ProxyPassMatch and fix some old bugs related to setting options together with ProxyPassMatch (like PR 43513 and PR 54973). So, anyone who would have time to review this patch, please? :) I slightly modified your patch with the following changes : 1. Use \-escaping instead of %-escaping for wildcard_match so to consume less space and unescape quickly. Hm, I'm not sure right now if that's right. The corner-case here is following configuration: ScriptAlias /cgi?bin/ "/usr/local/apache2/cgi-bin/" ProxyPassMatch ^/x/(.*\.x)$ http://127.0.0.1/cgi?bin/$1 timeout=1 I'm not sure how valid this configuration is, but it leads to following request: "http://127.0.0.1/cgi%3Fbin/printenv.x"; while the proxy_worker has following escaped string: "http://127.0.0.1/cgi\\?bin/*"; So the right proxy_worker is not matched in this case (not sure if "right" is really "right" in this case :) ). 2. Fix the recursion in ap_proxy_wildcard_match(), which was calling the legacy ap_strcmp_match(), and unescape according to 1. 2'. Shouldn't this function be made iterative (it is not final recursive currently)? 3. Always try to get the worker with its de_socketfy()ed name, and only if not found use ap_proxy_define[_wildcard]_worker() with the configured URL (add_pass() and proxysection() were not consistent about ap_proxy_get_worker()'s given name/URL). +1 to this. 4. Don't match both the "normal" name with strcnmp() and the wildcard name with strcmp_match() in ap_proxy_get_worker(), since the worker is either wildcard or not (this enforcement is also added to add_pass() and proxysection() so that it is not possible to declare the same worker name once with a Match, again without, or vice versa). 4'. Does it makes sense to use the longest match_name or should we go for the first matching wildcard worker (if any) wins? Hm, I think it should be the longest match_name, if you mean that "http://127.0.0.1/cgi-bin/*"; should be matched if there exist two workers like "http://127.0.0.1/*"; and "http://127.0.0.1/cgi-bin/*";. But if I'm right, in this case, there will be just one *shared* worker. Can you write an example of two non-shared workers with matching wirldcard name? 5. Rename ap_proxy_wildcard_match() as ap_proxy_strcmp_ematch() to suggest it could go to server/util.ch (if/when the associated ap_matchexp_escape() and ap_proxy_strcasecmp_ematch() are coded). 6. Move proxy_worker_shared { char wildcard_name[PROXY_WORKER_MAX_NAME_SIZE]; } to proxy_worker { char *match_name; } to save shm space and avoid length issues (I don't see how this could be updated at runtime, for now, balancer-manager can't declare Match workers). 6. Keep ap_proxy_escape_wildcard_url() static (private) since it is not used outside proxy_util (now). +1 to all above. This is not very tested (yet), I'll come back if I detect issues, but first I'd like your feedbacks on this modifications... I've tested the patch and generally it works for me (except the things I wrote above). Patch attached. Regards, Yann. Regards, Jan Kaluza
Re: [PATCH] Fix settings options with ProxyPassMatch
Hi Jan, sorry for the late. On Tue, Apr 22, 2014 at 3:39 PM, Jan Kaluža wrote: > Hi again, > > the patch has been here for some time already. I hesitate to commit it to > trunk without any review, because it changes the core code in mod_proxy and > I'm afraid that there could exist more corner-cases I'm not aware of. > > On the other side (and to motivate someone with deeper knowledge of > mod_proxy :) ), it would be great to have UDS support also for > ProxyPassMatch and fix some old bugs related to setting options together > with ProxyPassMatch (like PR 43513 and PR 54973). > > So, anyone who would have time to review this patch, please? :) I slightly modified your patch with the following changes : 1. Use \-escaping instead of %-escaping for wildcard_match so to consume less space and unescape quickly. 2. Fix the recursion in ap_proxy_wildcard_match(), which was calling the legacy ap_strcmp_match(), and unescape according to 1. 2'. Shouldn't this function be made iterative (it is not final recursive currently)? 3. Always try to get the worker with its de_socketfy()ed name, and only if not found use ap_proxy_define[_wildcard]_worker() with the configured URL (add_pass() and proxysection() were not consistent about ap_proxy_get_worker()'s given name/URL). 4. Don't match both the "normal" name with strcnmp() and the wildcard name with strcmp_match() in ap_proxy_get_worker(), since the worker is either wildcard or not (this enforcement is also added to add_pass() and proxysection() so that it is not possible to declare the same worker name once with a Match, again without, or vice versa). 4'. Does it makes sense to use the longest match_name or should we go for the first matching wildcard worker (if any) wins? 5. Rename ap_proxy_wildcard_match() as ap_proxy_strcmp_ematch() to suggest it could go to server/util.ch (if/when the associated ap_matchexp_escape() and ap_proxy_strcasecmp_ematch() are coded). 6. Move proxy_worker_shared { char wildcard_name[PROXY_WORKER_MAX_NAME_SIZE]; } to proxy_worker { char *match_name; } to save shm space and avoid length issues (I don't see how this could be updated at runtime, for now, balancer-manager can't declare Match workers). 6. Keep ap_proxy_escape_wildcard_url() static (private) since it is not used outside proxy_util (now). This is not very tested (yet), I'll come back if I detect issues, but first I'd like your feedbacks on this modifications... Patch attached. Regards, Yann. trunk-mod_proxy-Match.patch Description: application/download
Re: [PATCH] Fix settings options with ProxyPassMatch
Hi again, the patch has been here for some time already. I hesitate to commit it to trunk without any review, because it changes the core code in mod_proxy and I'm afraid that there could exist more corner-cases I'm not aware of. On the other side (and to motivate someone with deeper knowledge of mod_proxy :) ), it would be great to have UDS support also for ProxyPassMatch and fix some old bugs related to setting options together with ProxyPassMatch (like PR 43513 and PR 54973). So, anyone who would have time to review this patch, please? :) Thanks, Jan Kaluza On 04/10/2014 10:25 AM, Jan Kaluža wrote: Hi, can you please check the attached patch? I think it should be closer to the proper way how to fix this than the previous one. It works properly for me even for URLs with '*' and '?'. Regards, Jan Kaluza On 03/21/2014 03:27 PM, Yann Ylavic wrote: Yes, selecting the right worker is all in ap_proxy_get_worker(), but probably also add_pass() and proxysection() would need something like ap_proxy_define_wildcard_worker() to register this kind of worker (save the original name, ...). On Fri, Mar 21, 2014 at 3:15 PM, Jim Jagielski wrote: Just a thought, but wouldn't the better place to "fix" this be in ap_proxy_get_worker()?? On Mar 21, 2014, at 9:13 AM, Yann Ylavic wrote: oups, sorry for the numbering. On Fri, Mar 21, 2014 at 2:11 PM, Yann Ylavic wrote: On Wed, Mar 19, 2014 at 9:59 AM, Jan Kaluža wrote: On 03/18/2014 02:46 PM, Yann Ylavic wrote: On Tue, Mar 18, 2014 at 2:38 PM, Yann Ylavic wrote: Wouldn't it be possible to define wildcard workers when the URL is known to be a regexp substitution? For these workers' URLs, the dollars (plus the following digit) could be replaced by a wildcard (ie. *) and ap_proxy_get_worker() could then use ap_strcasecmp_match() against the requested URL. I meant ap_strcmp_match(), this has to be case sensitive... I've implemented your idea. Can you check the attached patch please? It fixes the original PR and also ProxyPassMatch with UDS for me. If it's OK, I will commit it. I took a deeper look into this but I'm afraid it is a more complex story actually... 1. Workers can be defined in sections too. The same should probably be done in proxysection(). 2. Both '*' and '?' are legitimate URL chars. We need a way to escape the original ones in the configured worker's URL, but ap_strcmp_match doesn't handle (un)escaping. So maybe new ap_matchexp_[un]escape() and ap_str[case]cmp_ematch() functions should be implemented. 5. When no $ is used in the worker's URL, an implicit $1 is appended by proxy_trans(). Hence ProxyPassMatch /some/path/with(/capture) http://some.host is equivalent to ProxyPassMatch /some/path/with(/capture) http://some.host$1 So an implicit '*' should be appended to the worker's wildcard URL in the first case (maybe only when the match has a capture but this is just an optimization, * would match empty). 3. I think we should mark the worker as wildcard when it is defined by ProxyPassMatch or a section. So that ap_proxy_get_worker() won't use the costly ap_strcmp_match() for "normal" workers, and won't either trigger false positives due to '*' or '?' in the configured URL (which is not escaped). 4. What about the same URL used both with ProxyPass and ProxyPassMatch, same worker or different ones? IMHO, they should be different workers, and by rewritting the URL to a wildcard one as done in the patch they will be, but we lose the configured name, and eg. the balancer-manager's param "w" would now work only with the rewritten name (which is, at least, non intuitive). A new balancer-manager's param could be added to access wildcard workers (eg. "ww"), since they are different they may be accessed separately, hence I think we need to save the original URL for such case(s). So maybe for point 2.'s mark we could use the configured (original) URL as a new member of the proxy_worker struct (I don't see any reason for it to be shared in proxy_worker_shared but one may object...). This new field would be NULL for "normal" workers. 5. What about wildcard balancers (ProxyPassMatch /foo(/.*) balancer://mycluster/bar$1)? Since the balancers are registered/selected solely by their names (path and everything after is ignored), and then their workers are selected according to the balancer's method (the path still does not matter), I think there is no matching issue here (the requested path, eventually rewritten by proxy_trans(), will finally be appended to the worker's URL as is). Hence the same balancer's URL used both in ProxyPass and ProxyPassMatch can refer to the same balancer (IMHO). Nothing to be done therefore, but I may be missing something... 6. ProxyPassReverseMatch would be welcome too, but that's probably another story... Regards, Yann.
Re: [PATCH] Fix settings options with ProxyPassMatch
Hi, can you please check the attached patch? I think it should be closer to the proper way how to fix this than the previous one. It works properly for me even for URLs with '*' and '?'. Regards, Jan Kaluza On 03/21/2014 03:27 PM, Yann Ylavic wrote: Yes, selecting the right worker is all in ap_proxy_get_worker(), but probably also add_pass() and proxysection() would need something like ap_proxy_define_wildcard_worker() to register this kind of worker (save the original name, ...). On Fri, Mar 21, 2014 at 3:15 PM, Jim Jagielski wrote: Just a thought, but wouldn't the better place to "fix" this be in ap_proxy_get_worker()?? On Mar 21, 2014, at 9:13 AM, Yann Ylavic wrote: oups, sorry for the numbering. On Fri, Mar 21, 2014 at 2:11 PM, Yann Ylavic wrote: On Wed, Mar 19, 2014 at 9:59 AM, Jan Kaluža wrote: On 03/18/2014 02:46 PM, Yann Ylavic wrote: On Tue, Mar 18, 2014 at 2:38 PM, Yann Ylavic wrote: Wouldn't it be possible to define wildcard workers when the URL is known to be a regexp substitution? For these workers' URLs, the dollars (plus the following digit) could be replaced by a wildcard (ie. *) and ap_proxy_get_worker() could then use ap_strcasecmp_match() against the requested URL. I meant ap_strcmp_match(), this has to be case sensitive... I've implemented your idea. Can you check the attached patch please? It fixes the original PR and also ProxyPassMatch with UDS for me. If it's OK, I will commit it. I took a deeper look into this but I'm afraid it is a more complex story actually... 1. Workers can be defined in sections too. The same should probably be done in proxysection(). 2. Both '*' and '?' are legitimate URL chars. We need a way to escape the original ones in the configured worker's URL, but ap_strcmp_match doesn't handle (un)escaping. So maybe new ap_matchexp_[un]escape() and ap_str[case]cmp_ematch() functions should be implemented. 5. When no $ is used in the worker's URL, an implicit $1 is appended by proxy_trans(). Hence ProxyPassMatch /some/path/with(/capture) http://some.host is equivalent to ProxyPassMatch /some/path/with(/capture) http://some.host$1 So an implicit '*' should be appended to the worker's wildcard URL in the first case (maybe only when the match has a capture but this is just an optimization, * would match empty). 3. I think we should mark the worker as wildcard when it is defined by ProxyPassMatch or a section. So that ap_proxy_get_worker() won't use the costly ap_strcmp_match() for "normal" workers, and won't either trigger false positives due to '*' or '?' in the configured URL (which is not escaped). 4. What about the same URL used both with ProxyPass and ProxyPassMatch, same worker or different ones? IMHO, they should be different workers, and by rewritting the URL to a wildcard one as done in the patch they will be, but we lose the configured name, and eg. the balancer-manager's param "w" would now work only with the rewritten name (which is, at least, non intuitive). A new balancer-manager's param could be added to access wildcard workers (eg. "ww"), since they are different they may be accessed separately, hence I think we need to save the original URL for such case(s). So maybe for point 2.'s mark we could use the configured (original) URL as a new member of the proxy_worker struct (I don't see any reason for it to be shared in proxy_worker_shared but one may object...). This new field would be NULL for "normal" workers. 5. What about wildcard balancers (ProxyPassMatch /foo(/.*) balancer://mycluster/bar$1)? Since the balancers are registered/selected solely by their names (path and everything after is ignored), and then their workers are selected according to the balancer's method (the path still does not matter), I think there is no matching issue here (the requested path, eventually rewritten by proxy_trans(), will finally be appended to the worker's URL as is). Hence the same balancer's URL used both in ProxyPass and ProxyPassMatch can refer to the same balancer (IMHO). Nothing to be done therefore, but I may be missing something... 6. ProxyPassReverseMatch would be welcome too, but that's probably another story... Regards, Yann. Index: modules/proxy/mod_proxy.c === --- modules/proxy/mod_proxy.c (revision 1585929) +++ modules/proxy/mod_proxy.c (working copy) @@ -1631,10 +1631,23 @@ new->balancer = balancer; } else { -proxy_worker *worker = ap_proxy_get_worker(cmd->temp_pool, NULL, conf, de_socketfy(cmd->pool, r)); +proxy_worker *worker; int reuse = 0; +char *r_orig = r; +if (use_regex) { +r = ap_proxy_escape_wildcard_url(cmd->temp_pool, r); +} +worker = ap_proxy_get_worker(cmd->temp_pool, NULL, conf, r); if (!worker) { -const char *err = ap_proxy_define_worker(cmd->pool, &worker, NULL, conf, r, 0); +const char *err = NULL; +
Re: [PATCH] Fix settings options with ProxyPassMatch
Yes, selecting the right worker is all in ap_proxy_get_worker(), but probably also add_pass() and proxysection() would need something like ap_proxy_define_wildcard_worker() to register this kind of worker (save the original name, ...). On Fri, Mar 21, 2014 at 3:15 PM, Jim Jagielski wrote: > Just a thought, but wouldn't the better place to "fix" this > be in ap_proxy_get_worker()?? > > On Mar 21, 2014, at 9:13 AM, Yann Ylavic wrote: > >> oups, sorry for the numbering. >> >> On Fri, Mar 21, 2014 at 2:11 PM, Yann Ylavic wrote: >>> On Wed, Mar 19, 2014 at 9:59 AM, Jan Kaluža wrote: On 03/18/2014 02:46 PM, Yann Ylavic wrote: > > On Tue, Mar 18, 2014 at 2:38 PM, Yann Ylavic wrote: >> >> Wouldn't it be possible to define wildcard workers when the URL is >> known to be a regexp substitution? >> For these workers' URLs, the dollars (plus the following digit) could >> be replaced by a wildcard (ie. *) and ap_proxy_get_worker() could then >> use ap_strcasecmp_match() against the requested URL. > > > I meant ap_strcmp_match(), this has to be case sensitive... > I've implemented your idea. Can you check the attached patch please? It fixes the original PR and also ProxyPassMatch with UDS for me. If it's OK, I will commit it. >>> >>> I took a deeper look into this but I'm afraid it is a more complex >>> story actually... >>> >>> 1. Workers can be defined in sections too. >>> The same should probably be done in proxysection(). >>> >>> 2. Both '*' and '?' are legitimate URL chars. >>> We need a way to escape the original ones in the configured worker's >>> URL, but ap_strcmp_match doesn't handle (un)escaping. >>> So maybe new ap_matchexp_[un]escape() and ap_str[case]cmp_ematch() >>> functions should be implemented. >>> >>> 5. When no $ is used in the worker's URL, an implicit $1 is appended >>> by proxy_trans(). >>> Hence >>> ProxyPassMatch /some/path/with(/capture) http://some.host >>> is equivalent to >>> ProxyPassMatch /some/path/with(/capture) http://some.host$1 >>> So an implicit '*' should be appended to the worker's wildcard URL in >>> the first case (maybe only when the match has a capture but this is >>> just an optimization, * would match empty). >>> >>> 3. I think we should mark the worker as wildcard when it is defined by >>> ProxyPassMatch or a section. >>> So that ap_proxy_get_worker() won't use the costly ap_strcmp_match() >>> for "normal" workers, and won't either trigger false positives due to >>> '*' or '?' in the configured URL (which is not escaped). >>> >>> 4. What about the same URL used both with ProxyPass and >>> ProxyPassMatch, same worker or different ones? >>> IMHO, they should be different workers, and by rewritting the URL to a >>> wildcard one as done in the patch they will be, but we lose the >>> configured name, and eg. the balancer-manager's param "w" would now >>> work only with the rewritten name (which is, at least, non intuitive). >>> A new balancer-manager's param could be added to access wildcard >>> workers (eg. "ww"), since they are different they may be accessed >>> separately, hence I think we need to save the original URL for such >>> case(s). >>> So maybe for point 2.'s mark we could use the configured (original) >>> URL as a new member of the proxy_worker struct (I don't see any reason >>> for it to be shared in proxy_worker_shared but one may object...). >>> This new field would be NULL for "normal" workers. >>> >>> 5. What about wildcard balancers (ProxyPassMatch /foo(/.*) >>> balancer://mycluster/bar$1)? >>> Since the balancers are registered/selected solely by their names >>> (path and everything after is ignored), and then their workers are >>> selected according to the balancer's method (the path still does not >>> matter), I think there is no matching issue here (the requested path, >>> eventually rewritten by proxy_trans(), will finally be appended to the >>> worker's URL as is). >>> Hence the same balancer's URL used both in ProxyPass and >>> ProxyPassMatch can refer to the same balancer (IMHO). >>> Nothing to be done therefore, but I may be missing something... >>> >>> 6. ProxyPassReverseMatch would be welcome too, but that's probably >>> another story... >>> >>> Regards, >>> Yann. >> >
Re: [PATCH] Fix settings options with ProxyPassMatch
Just a thought, but wouldn't the better place to "fix" this be in ap_proxy_get_worker()?? On Mar 21, 2014, at 9:13 AM, Yann Ylavic wrote: > oups, sorry for the numbering. > > On Fri, Mar 21, 2014 at 2:11 PM, Yann Ylavic wrote: >> On Wed, Mar 19, 2014 at 9:59 AM, Jan Kaluža wrote: >>> On 03/18/2014 02:46 PM, Yann Ylavic wrote: On Tue, Mar 18, 2014 at 2:38 PM, Yann Ylavic wrote: > > Wouldn't it be possible to define wildcard workers when the URL is > known to be a regexp substitution? > For these workers' URLs, the dollars (plus the following digit) could > be replaced by a wildcard (ie. *) and ap_proxy_get_worker() could then > use ap_strcasecmp_match() against the requested URL. I meant ap_strcmp_match(), this has to be case sensitive... >>> >>> I've implemented your idea. Can you check the attached patch please? It >>> fixes the original PR and also ProxyPassMatch with UDS for me. >>> >>> If it's OK, I will commit it. >>> >> >> I took a deeper look into this but I'm afraid it is a more complex >> story actually... >> >> 1. Workers can be defined in sections too. >> The same should probably be done in proxysection(). >> >> 2. Both '*' and '?' are legitimate URL chars. >> We need a way to escape the original ones in the configured worker's >> URL, but ap_strcmp_match doesn't handle (un)escaping. >> So maybe new ap_matchexp_[un]escape() and ap_str[case]cmp_ematch() >> functions should be implemented. >> >> 5. When no $ is used in the worker's URL, an implicit $1 is appended >> by proxy_trans(). >> Hence >> ProxyPassMatch /some/path/with(/capture) http://some.host >> is equivalent to >> ProxyPassMatch /some/path/with(/capture) http://some.host$1 >> So an implicit '*' should be appended to the worker's wildcard URL in >> the first case (maybe only when the match has a capture but this is >> just an optimization, * would match empty). >> >> 3. I think we should mark the worker as wildcard when it is defined by >> ProxyPassMatch or a section. >> So that ap_proxy_get_worker() won't use the costly ap_strcmp_match() >> for "normal" workers, and won't either trigger false positives due to >> '*' or '?' in the configured URL (which is not escaped). >> >> 4. What about the same URL used both with ProxyPass and >> ProxyPassMatch, same worker or different ones? >> IMHO, they should be different workers, and by rewritting the URL to a >> wildcard one as done in the patch they will be, but we lose the >> configured name, and eg. the balancer-manager's param "w" would now >> work only with the rewritten name (which is, at least, non intuitive). >> A new balancer-manager's param could be added to access wildcard >> workers (eg. "ww"), since they are different they may be accessed >> separately, hence I think we need to save the original URL for such >> case(s). >> So maybe for point 2.'s mark we could use the configured (original) >> URL as a new member of the proxy_worker struct (I don't see any reason >> for it to be shared in proxy_worker_shared but one may object...). >> This new field would be NULL for "normal" workers. >> >> 5. What about wildcard balancers (ProxyPassMatch /foo(/.*) >> balancer://mycluster/bar$1)? >> Since the balancers are registered/selected solely by their names >> (path and everything after is ignored), and then their workers are >> selected according to the balancer's method (the path still does not >> matter), I think there is no matching issue here (the requested path, >> eventually rewritten by proxy_trans(), will finally be appended to the >> worker's URL as is). >> Hence the same balancer's URL used both in ProxyPass and >> ProxyPassMatch can refer to the same balancer (IMHO). >> Nothing to be done therefore, but I may be missing something... >> >> 6. ProxyPassReverseMatch would be welcome too, but that's probably >> another story... >> >> Regards, >> Yann. >
Re: [PATCH] Fix settings options with ProxyPassMatch
oups, sorry for the numbering. On Fri, Mar 21, 2014 at 2:11 PM, Yann Ylavic wrote: > On Wed, Mar 19, 2014 at 9:59 AM, Jan Kaluža wrote: >> On 03/18/2014 02:46 PM, Yann Ylavic wrote: >>> >>> On Tue, Mar 18, 2014 at 2:38 PM, Yann Ylavic wrote: Wouldn't it be possible to define wildcard workers when the URL is known to be a regexp substitution? For these workers' URLs, the dollars (plus the following digit) could be replaced by a wildcard (ie. *) and ap_proxy_get_worker() could then use ap_strcasecmp_match() against the requested URL. >>> >>> >>> I meant ap_strcmp_match(), this has to be case sensitive... >>> >> >> I've implemented your idea. Can you check the attached patch please? It >> fixes the original PR and also ProxyPassMatch with UDS for me. >> >> If it's OK, I will commit it. >> > > I took a deeper look into this but I'm afraid it is a more complex > story actually... > > 1. Workers can be defined in sections too. > The same should probably be done in proxysection(). > > 2. Both '*' and '?' are legitimate URL chars. > We need a way to escape the original ones in the configured worker's > URL, but ap_strcmp_match doesn't handle (un)escaping. > So maybe new ap_matchexp_[un]escape() and ap_str[case]cmp_ematch() > functions should be implemented. > > 5. When no $ is used in the worker's URL, an implicit $1 is appended > by proxy_trans(). > Hence >ProxyPassMatch /some/path/with(/capture) http://some.host > is equivalent to >ProxyPassMatch /some/path/with(/capture) http://some.host$1 > So an implicit '*' should be appended to the worker's wildcard URL in > the first case (maybe only when the match has a capture but this is > just an optimization, * would match empty). > > 3. I think we should mark the worker as wildcard when it is defined by > ProxyPassMatch or a section. > So that ap_proxy_get_worker() won't use the costly ap_strcmp_match() > for "normal" workers, and won't either trigger false positives due to > '*' or '?' in the configured URL (which is not escaped). > > 4. What about the same URL used both with ProxyPass and > ProxyPassMatch, same worker or different ones? > IMHO, they should be different workers, and by rewritting the URL to a > wildcard one as done in the patch they will be, but we lose the > configured name, and eg. the balancer-manager's param "w" would now > work only with the rewritten name (which is, at least, non intuitive). > A new balancer-manager's param could be added to access wildcard > workers (eg. "ww"), since they are different they may be accessed > separately, hence I think we need to save the original URL for such > case(s). > So maybe for point 2.'s mark we could use the configured (original) > URL as a new member of the proxy_worker struct (I don't see any reason > for it to be shared in proxy_worker_shared but one may object...). > This new field would be NULL for "normal" workers. > > 5. What about wildcard balancers (ProxyPassMatch /foo(/.*) > balancer://mycluster/bar$1)? > Since the balancers are registered/selected solely by their names > (path and everything after is ignored), and then their workers are > selected according to the balancer's method (the path still does not > matter), I think there is no matching issue here (the requested path, > eventually rewritten by proxy_trans(), will finally be appended to the > worker's URL as is). > Hence the same balancer's URL used both in ProxyPass and > ProxyPassMatch can refer to the same balancer (IMHO). > Nothing to be done therefore, but I may be missing something... > > 6. ProxyPassReverseMatch would be welcome too, but that's probably > another story... > > Regards, > Yann.
Re: [PATCH] Fix settings options with ProxyPassMatch
On Wed, Mar 19, 2014 at 9:59 AM, Jan Kaluža wrote: > On 03/18/2014 02:46 PM, Yann Ylavic wrote: >> >> On Tue, Mar 18, 2014 at 2:38 PM, Yann Ylavic wrote: >>> >>> Wouldn't it be possible to define wildcard workers when the URL is >>> known to be a regexp substitution? >>> For these workers' URLs, the dollars (plus the following digit) could >>> be replaced by a wildcard (ie. *) and ap_proxy_get_worker() could then >>> use ap_strcasecmp_match() against the requested URL. >> >> >> I meant ap_strcmp_match(), this has to be case sensitive... >> > > I've implemented your idea. Can you check the attached patch please? It > fixes the original PR and also ProxyPassMatch with UDS for me. > > If it's OK, I will commit it. > I took a deeper look into this but I'm afraid it is a more complex story actually... 1. Workers can be defined in sections too. The same should probably be done in proxysection(). 2. Both '*' and '?' are legitimate URL chars. We need a way to escape the original ones in the configured worker's URL, but ap_strcmp_match doesn't handle (un)escaping. So maybe new ap_matchexp_[un]escape() and ap_str[case]cmp_ematch() functions should be implemented. 5. When no $ is used in the worker's URL, an implicit $1 is appended by proxy_trans(). Hence ProxyPassMatch /some/path/with(/capture) http://some.host is equivalent to ProxyPassMatch /some/path/with(/capture) http://some.host$1 So an implicit '*' should be appended to the worker's wildcard URL in the first case (maybe only when the match has a capture but this is just an optimization, * would match empty). 3. I think we should mark the worker as wildcard when it is defined by ProxyPassMatch or a section. So that ap_proxy_get_worker() won't use the costly ap_strcmp_match() for "normal" workers, and won't either trigger false positives due to '*' or '?' in the configured URL (which is not escaped). 4. What about the same URL used both with ProxyPass and ProxyPassMatch, same worker or different ones? IMHO, they should be different workers, and by rewritting the URL to a wildcard one as done in the patch they will be, but we lose the configured name, and eg. the balancer-manager's param "w" would now work only with the rewritten name (which is, at least, non intuitive). A new balancer-manager's param could be added to access wildcard workers (eg. "ww"), since they are different they may be accessed separately, hence I think we need to save the original URL for such case(s). So maybe for point 2.'s mark we could use the configured (original) URL as a new member of the proxy_worker struct (I don't see any reason for it to be shared in proxy_worker_shared but one may object...). This new field would be NULL for "normal" workers. 5. What about wildcard balancers (ProxyPassMatch /foo(/.*) balancer://mycluster/bar$1)? Since the balancers are registered/selected solely by their names (path and everything after is ignored), and then their workers are selected according to the balancer's method (the path still does not matter), I think there is no matching issue here (the requested path, eventually rewritten by proxy_trans(), will finally be appended to the worker's URL as is). Hence the same balancer's URL used both in ProxyPass and ProxyPassMatch can refer to the same balancer (IMHO). Nothing to be done therefore, but I may be missing something... 6. ProxyPassReverseMatch would be welcome too, but that's probably another story... Regards, Yann.
Re: [PATCH] Fix settings options with ProxyPassMatch
On 03/19/2014 10:13 AM, Jan Kaluža wrote: On 03/19/2014 09:59 AM, Jan Kaluža wrote: On 03/18/2014 02:46 PM, Yann Ylavic wrote: On Tue, Mar 18, 2014 at 2:38 PM, Yann Ylavic wrote: Wouldn't it be possible to define wildcard workers when the URL is known to be a regexp substitution? For these workers' URLs, the dollars (plus the following digit) could be replaced by a wildcard (ie. *) and ap_proxy_get_worker() could then use ap_strcasecmp_match() against the requested URL. I meant ap_strcmp_match(), this has to be case sensitive... I've implemented your idea. Can you check the attached patch please? It fixes the original PR and also ProxyPassMatch with UDS for me. Ignore this patch. It breaks ProxyPass. Attached is better version. Not my day today. s/apr_isxdigit/apr_isdigit. If it's OK, I will commit it. Regards, Jan Kaluza Regards, Jan Kaluza Index: modules/proxy/mod_proxy.c === --- modules/proxy/mod_proxy.c (revision 1579155) +++ modules/proxy/mod_proxy.c (working copy) @@ -1631,8 +1631,29 @@ new->balancer = balancer; } else { -proxy_worker *worker = ap_proxy_get_worker(cmd->temp_pool, NULL, conf, de_socketfy(cmd->pool, r)); +proxy_worker *worker; int reuse = 0; +/* When the scheme or the hostname contains '$', skip. + * When the path contains '$N', replace '$N' with wildcard. */ +if (use_regex) { +char *r2 = apr_pstrdup(cmd->temp_pool, r); +char *x, *y; +y = ap_strstr(r2, "://"); +if (y != NULL) { +for (x = y; *y; ++x, ++y) { +if (*y != '$') { +*x = *y; +} +else if (apr_isdigit(*(y + 1))) { +*x = '*'; +y += 1; +} +} +*x = '\0'; +} +r = r2; +} +worker = ap_proxy_get_worker(cmd->temp_pool, NULL, conf, r); if (!worker) { const char *err = ap_proxy_define_worker(cmd->pool, &worker, NULL, conf, r, 0); if (err) Index: modules/proxy/proxy_util.c === --- modules/proxy/proxy_util.c (revision 1579155) +++ modules/proxy/proxy_util.c (working copy) @@ -1568,7 +1568,8 @@ if ( ((worker_name_length = strlen(worker->s->name)) <= url_length) && (worker_name_length >= min_match) && (worker_name_length > max_match) -&& (strncmp(url_copy, worker->s->name, worker_name_length) == 0) ) { +&& (ap_strcmp_match(url_copy, worker->s->name) == 0 +|| (strncmp(url_copy, worker->s->name, worker_name_length) == 0)) ) { max_worker = worker; max_match = worker_name_length; } @@ -1580,7 +1581,8 @@ if ( ((worker_name_length = strlen(worker->s->name)) <= url_length) && (worker_name_length >= min_match) && (worker_name_length > max_match) -&& (strncmp(url_copy, worker->s->name, worker_name_length) == 0) ) { +&& (ap_strcmp_match(url_copy, worker->s->name) == 0 +|| (strncmp(url_copy, worker->s->name, worker_name_length) == 0)) ) { max_worker = worker; max_match = worker_name_length; }
Re: [PATCH] Fix settings options with ProxyPassMatch
On 03/19/2014 09:59 AM, Jan Kaluža wrote: On 03/18/2014 02:46 PM, Yann Ylavic wrote: On Tue, Mar 18, 2014 at 2:38 PM, Yann Ylavic wrote: Wouldn't it be possible to define wildcard workers when the URL is known to be a regexp substitution? For these workers' URLs, the dollars (plus the following digit) could be replaced by a wildcard (ie. *) and ap_proxy_get_worker() could then use ap_strcasecmp_match() against the requested URL. I meant ap_strcmp_match(), this has to be case sensitive... I've implemented your idea. Can you check the attached patch please? It fixes the original PR and also ProxyPassMatch with UDS for me. Ignore this patch. It breaks ProxyPass. Attached is better version. If it's OK, I will commit it. Regards, Jan Kaluza Regards, Jan Kaluza Index: modules/proxy/mod_proxy.c === --- modules/proxy/mod_proxy.c (revision 1579155) +++ modules/proxy/mod_proxy.c (working copy) @@ -1631,8 +1631,29 @@ new->balancer = balancer; } else { -proxy_worker *worker = ap_proxy_get_worker(cmd->temp_pool, NULL, conf, de_socketfy(cmd->pool, r)); +proxy_worker *worker; int reuse = 0; +/* When the scheme or the hostname contains '$', skip. + * When the path contains '$N', replace '$N' with wildcard. */ +if (use_regex) { +char *r2 = apr_pstrdup(cmd->temp_pool, r); +char *x, *y; +y = ap_strstr(r2, "://"); +if (y != NULL) { +for (x = y; *y; ++x, ++y) { +if (*y != '$') { +*x = *y; +} +else if (apr_isxdigit(*(y + 1))) { +*x = '*'; +y += 1; +} +} +*x = '\0'; +} +r = r2; +} +worker = ap_proxy_get_worker(cmd->temp_pool, NULL, conf, r); if (!worker) { const char *err = ap_proxy_define_worker(cmd->pool, &worker, NULL, conf, r, 0); if (err) Index: modules/proxy/proxy_util.c === --- modules/proxy/proxy_util.c (revision 1579155) +++ modules/proxy/proxy_util.c (working copy) @@ -1568,7 +1568,8 @@ if ( ((worker_name_length = strlen(worker->s->name)) <= url_length) && (worker_name_length >= min_match) && (worker_name_length > max_match) -&& (strncmp(url_copy, worker->s->name, worker_name_length) == 0) ) { +&& (ap_strcmp_match(url_copy, worker->s->name) == 0 +|| (strncmp(url_copy, worker->s->name, worker_name_length) == 0)) ) { max_worker = worker; max_match = worker_name_length; } @@ -1580,7 +1581,8 @@ if ( ((worker_name_length = strlen(worker->s->name)) <= url_length) && (worker_name_length >= min_match) && (worker_name_length > max_match) -&& (strncmp(url_copy, worker->s->name, worker_name_length) == 0) ) { +&& (ap_strcmp_match(url_copy, worker->s->name) == 0 +|| (strncmp(url_copy, worker->s->name, worker_name_length) == 0)) ) { max_worker = worker; max_match = worker_name_length; }
Re: [PATCH] Fix settings options with ProxyPassMatch
On 03/18/2014 02:46 PM, Yann Ylavic wrote: On Tue, Mar 18, 2014 at 2:38 PM, Yann Ylavic wrote: Wouldn't it be possible to define wildcard workers when the URL is known to be a regexp substitution? For these workers' URLs, the dollars (plus the following digit) could be replaced by a wildcard (ie. *) and ap_proxy_get_worker() could then use ap_strcasecmp_match() against the requested URL. I meant ap_strcmp_match(), this has to be case sensitive... I've implemented your idea. Can you check the attached patch please? It fixes the original PR and also ProxyPassMatch with UDS for me. If it's OK, I will commit it. Regards, Jan Kaluza Index: modules/proxy/mod_proxy.c === --- modules/proxy/mod_proxy.c (revision 1579155) +++ modules/proxy/mod_proxy.c (working copy) @@ -1631,8 +1631,29 @@ new->balancer = balancer; } else { -proxy_worker *worker = ap_proxy_get_worker(cmd->temp_pool, NULL, conf, de_socketfy(cmd->pool, r)); +proxy_worker *worker; int reuse = 0; +/* When the scheme or the hostname contains '$', skip. + * When the path contains '$N', replace '$N' with wildcard. */ +if (use_regex) { +char *r2 = apr_pstrdup(cmd->temp_pool, r); +char *x, *y; +y = ap_strstr(r2, "://"); +if (y != NULL) { +for (x = y; *y; ++x, ++y) { +if (*y != '$') { +*x = *y; +} +else if (apr_isxdigit(*(y + 1))) { +*x = '*'; +y += 1; +} +} +*x = '\0'; +} +r = r2; +} +worker = ap_proxy_get_worker(cmd->temp_pool, NULL, conf, r); if (!worker) { const char *err = ap_proxy_define_worker(cmd->pool, &worker, NULL, conf, r, 0); if (err) Index: modules/proxy/proxy_util.c === --- modules/proxy/proxy_util.c (revision 1579155) +++ modules/proxy/proxy_util.c (working copy) @@ -1568,7 +1568,7 @@ if ( ((worker_name_length = strlen(worker->s->name)) <= url_length) && (worker_name_length >= min_match) && (worker_name_length > max_match) -&& (strncmp(url_copy, worker->s->name, worker_name_length) == 0) ) { +&& (ap_strcmp_match(url_copy, worker->s->name) == 0) ) { max_worker = worker; max_match = worker_name_length; } @@ -1580,7 +1580,7 @@ if ( ((worker_name_length = strlen(worker->s->name)) <= url_length) && (worker_name_length >= min_match) && (worker_name_length > max_match) -&& (strncmp(url_copy, worker->s->name, worker_name_length) == 0) ) { +&& (ap_strcmp_match(url_copy, worker->s->name) == 0) ) { max_worker = worker; max_match = worker_name_length; }
Re: [PATCH] Fix settings options with ProxyPassMatch
On 03/18/2014 02:46 PM, Yann Ylavic wrote: On Tue, Mar 18, 2014 at 2:38 PM, Yann Ylavic wrote: Wouldn't it be possible to define wildcard workers when the URL is known to be a regexp substitution? For these workers' URLs, the dollars (plus the following digit) could be replaced by a wildcard (ie. *) and ap_proxy_get_worker() could then use ap_strcasecmp_match() against the requested URL. I meant ap_strcmp_match(), this has to be case sensitive... That looks like good way too. I will give it a try. Maybe it will look better than this original patch (with which I'm not so sure...) Regards, Jan Kaluza
Re: [PATCH] Fix settings options with ProxyPassMatch
On Tue, Mar 18, 2014 at 2:38 PM, Yann Ylavic wrote: > Wouldn't it be possible to define wildcard workers when the URL is > known to be a regexp substitution? > For these workers' URLs, the dollars (plus the following digit) could > be replaced by a wildcard (ie. *) and ap_proxy_get_worker() could then > use ap_strcasecmp_match() against the requested URL. I meant ap_strcmp_match(), this has to be case sensitive...
Re: [PATCH] Fix settings options with ProxyPassMatch
Wouldn't it be possible to define wildcard workers when the URL is known to be a regexp substitution? For these workers' URLs, the dollars (plus the following digit) could be replaced by a wildcard (ie. *) and ap_proxy_get_worker() could then use ap_strcasecmp_match() against the requested URL. Regards, Yann. On Tue, Mar 18, 2014 at 12:26 PM, Jan Kaluza wrote: > This is also needed to fix ProxyPassMatch with UDS. Without this patch > following configuration does not work: > > ProxyPassMatch ^/web/(.*\.php)$ > unix:/run/php-fpm/fpm.sock|fcgi://127.0.0.1/var/www/html/$1 > > If nobody is against, I will commit it to trunk later this week. > > Regards, > Jan Kaluza > > > On 09/16/2013 02:44 PM, Jan Kaluža wrote: >> >> Hi, >> >> I have found out that if you use ProxyPassMatch with regexp and "$1" >> (see PR 43513 and PR 54973), the right proxy_worker is not used and >> "reverse" proxy_worker is used instead. The result is situation where no >> options like "timeout" can be set in this case. >> >> The real problem here is that proxy_worker is chosen according to its >> name which contains "$1" sequence in our case. >> >> I have found out that there is already patch fixing ProxyPassMatch >> behaviour with "$1" in PR 43513. I have rebased the patch for trunk and >> changed its style a bit. >> >> The patch strips everything after "$", so it for example changes the >> name of proxy_worker from "http://127.0.0.1/cgi-bin/$1"; to >> "http://127.0.0.1/cgi-bin/";. Later when request arrives, proper >> proxy_worker is chosen. Without this change, proxy_worker's name with >> "$1" would be compared against real request URL, but real request URL >> does not contain this "$1" sequence and therefore this correct >> proxy_worker wouldn't be chosen. >> >> Do you see any problems with this patch? >> >> Regards, >> Jan Kaluza > >
Re: [PATCH] Fix settings options with ProxyPassMatch
This is also needed to fix ProxyPassMatch with UDS. Without this patch following configuration does not work: ProxyPassMatch ^/web/(.*\.php)$ unix:/run/php-fpm/fpm.sock|fcgi://127.0.0.1/var/www/html/$1 If nobody is against, I will commit it to trunk later this week. Regards, Jan Kaluza On 09/16/2013 02:44 PM, Jan Kaluža wrote: Hi, I have found out that if you use ProxyPassMatch with regexp and "$1" (see PR 43513 and PR 54973), the right proxy_worker is not used and "reverse" proxy_worker is used instead. The result is situation where no options like "timeout" can be set in this case. The real problem here is that proxy_worker is chosen according to its name which contains "$1" sequence in our case. I have found out that there is already patch fixing ProxyPassMatch behaviour with "$1" in PR 43513. I have rebased the patch for trunk and changed its style a bit. The patch strips everything after "$", so it for example changes the name of proxy_worker from "http://127.0.0.1/cgi-bin/$1"; to "http://127.0.0.1/cgi-bin/";. Later when request arrives, proper proxy_worker is chosen. Without this change, proxy_worker's name with "$1" would be compared against real request URL, but real request URL does not contain this "$1" sequence and therefore this correct proxy_worker wouldn't be chosen. Do you see any problems with this patch? Regards, Jan Kaluza