> On 25 May 2018, at 17:43, Micha Lenk <mi...@lenk.info> wrote:
> 
> 524              s_from = strlen(m->from.c);
> 525              if (!strncasecmp(ctx->buf, m->from.c, s_from)) {
> ...                  ... do the string replacement ...
> 
> 
> ... where ctx->buf is the URL found in the HTML document, and m->from.c is 
> the first configured argument of ProxyHTMLURLMap. So, if the latter is a 
> prefix of the first, this condition should be true and the string replacement 
> should happen. When the expected string replacement doesn't happen, the 
> condition is false and the values of the variables are:
> 
> ctx->buf  = http://internal/!%22%23$/
> m->from.c = http://internal/!"#$/
> 
> So, the strings don't match and are not replaced for that reason.

Yep.  mod_proxy_html takes what it sees.  That's why it relies on another module
(mod_xml2enc) for i18n, which is kind-of what I expected to see from your
subject line!

> Going forward I am not interested in finding a work around for this, but more 
> how to approach a fix (if this is a bug at all).
> 
> Is it reasonable to expect mod_proxy_html to rewrite URL encoded URLs as well?

I think it's reasonable to use the escaped html in your ProxyHTMLURLMap.
If we have mod_proxy_html unescape characters, it adds complexity to the code,
and (perhaps more to the point) presents a mirror-image of your problem to
anyone with the opposite expectations.

> Let's assume this needs to be fixed. To make the strings match, we could 
> either URL escape the value from the Apache directive ProxyHTMLURLMap, or URL 
> temporarily URL-decode the string found in the HTML document just for the 
> purpose of the string comparison. What is the right thing to do?

I prefer to leave it to server admins to find the match that works for them.
I don't recollect this particular question ever arising in 15 years, which 
kind-of
suggests users are not confused by it!

-- 
Nick Kew

Reply via email to