Hello,

I am currently prototyping a module that does the equivalent of this
mod_rewrite incantation:

    RewriteCond %{REQUEST_URI} !-U
    RewriteRule (.*) http://another.host$1 [P,NS]

...in other words, "if a resource cannot be found on the local server,
reverse-proxy the request to another.host." This incantation, of
course, won't work as expected, because mod_rewrite evaluates
RewriteCond *after* RewriteRule, hence the need for a custom module.
Moreover, whichever response handler is installed for the URI, even
core, typically has to be run in order to discover whether or not it
returns a 404, so even if that incantation *did* work properly, it
still wouldn't produce the desired effect.

What this means is that the requested resource has to be run
end-to-end in a subrequest, and, in the event of a 404, its output
discarded before the request is handed off to the proxy.

That's all fine. I can do all that. However: a) dealing with request
bodies and b) handling interactions with other parts of the system
(specifically filters), opens up quite the can of worms.

Running the subrequest locally will consume the input brigade. I'm
counteracting that by installing a filter that tucks the input into a
temporary file, and then another filter which replays the input back
into the proxy request, if there is any of either. (Aside: it is worth
noting that the proxy request itself must not be a subrequest, as
mod_proxy_http discards subrequest bodies.)

If, however, the subrequest response is successful, I need to be able
to hang on to the response content and somehow promote it into main
request, so the configured filters will run against it (I have filters
that won't run except for main requests).

What I don't know a lot about is how the I/O brigade system *itself*
works. Am I right in understanding there's only one pair per
connection?

I should also mention this prototype is in mod_perl, which is more or
less irrelevant to the problem, except for the fact that
modperl_response_handler's priority is APR_HOOK_MIDDLE whereas
proxy_handler is APR_HOOK_FIRST. About all that means is the main
logic can't manifest as a response handler because by then the proxy
handler will already have been run. (It's also undesirable to run this
logic in a response handler because that will get in the way of any
other response handler that's been configured.)

A background writeup on why I want this thing to exist is here:
http://doriantaylor.com/the-redesign-dissolved
The code is here:
https://github.com/doriantaylor/p5-apache2-condproxy/blob/master/lib/Apache2/CondProxy.pm

I'd be grateful if anybody could either explain or point me in the
direction of a definitive summary of how request phases, subrequests
and the bucket brigades interact.

Thanks,

--
Dorian Taylor
http://doriantaylor.com/

Reply via email to