Wanted: Orientation on interaction between filters and subrequests
Hello, Apologies in advance for the potential repeat; I initially posted to modules-dev but realized this is more of an internals problem. I'm trying to write a module that does what one would intuitively expect from the following mod_rewrite incantation: RewriteCond %{REQUEST_URI} !-U RewriteRule (.*) http://another.host$1 [P,NS] In other words, If the resource (powered by an arbitrary back-end) can't be found on this server, reverse-proxy the request to another.host. —which due to its design, can't be done by mod_rewrite. What I want this functionality for is a sort of scaffolding through which a website can be replaced incrementally, resource by resource, without having to know anything about the middleware(s) of either the old or the new server (see http://doriantaylor.com/the-redesign-dissolved for context). I'm hoping somebody can spell out for me, or point me in the direction of resources on how to better understand the design of the bucket brigade I/O and how it interacts with subrequests. Here is my strategy so far: 1) Start with a fixup handler rigged to run only on main requests 2) The fixup handler performs a subrequest on the same URI as the main request 3) If the lookup response is 404, set the response handler to proxy-handler and r-filename to proxy:... and return OK 4) Otherwise, attach an output filter to the subrequest that (somehow) blocks writes to the network and run it 5) Repeat step 3. 6) If we're still here, set the response handler to a dummy handler with its own output filter that unblocks the output and return OK. (Note, for simplicity's sake I am leaving out dealing with request bodies for the moment, but the plan is tentatively to 'tee' them to a temporary file and then replay them into the proxy request if applicable.) (I should also note that unless there is a particularly compelling alternative I have overlooked, I have good reason to be confident that this conditional reverse proxy should manifest as an Apache module.) Where I'm fuzzy is the somehow of pausing the output to the network from the subrequest and resuming it in the response handler in the main request. I have a prototype which currently runs the local request twice: first in the subrequest which I discard, and again (notwithstanding being diverted to the reverse proxy) in the main request which I leave untouched (recall, I am performing this operation in the fixup phase). Inefficiency aside, this solution is inadequate because of its potential to corrupt application state in dynamic resources. And of course, the only reason why I have to run the subrequest at all, is because it appears the 404 status is more often than not set by a module's response handler. (I am also not clear about what happens under the hood with respect to the subrequest's header set/protocol data, and whether or I have to manually pull it up into the main request.) I considered putting an EOS bucket at the front of the subrequest's output brigade, and then popping it off in an output filter in the main request, but I'm not sure how nasty a hack that would be and/or what side effects might arise from a trick like that. Alternatively, I suppose I could write the subrequest's content to another temporary file and then just replay it in the response handler, but I'd prefer to avoid creating any unnecessary I/O. Thanks in advance for any orientation, existing code that behaves similarly, or any other advice anybody can share. Regards, -- Dorian Taylor http://doriantaylor.com/
Re: Need orientation around bucket brigades, subrequests, etc.
On Sat, Jul 20, 2013 at 6:41 AM, Vincenzo D'Amore v.dam...@gmail.com wrote: I have implemented just same thing using rewritemap and a little of PHP. If you are interested to evaluate a different approach take a look at this: http://stackoverflow.com/questions/13030557/apache-just-rewrite-if-external-ressource-exists/16647345#16647345 Thanks, though unfortunately I'm fairly certain the only way to reliably support request bodies, as well as not call local requests twice, is to do some acrobatics with filters. I'd be grateful for any light you could shed on those issues. -- Dorian Taylor http://doriantaylor.com/
Need orientation around bucket brigades, subrequests, etc.
Hello, I am currently prototyping a module that does the equivalent of this mod_rewrite incantation: RewriteCond %{REQUEST_URI} !-U RewriteRule (.*) http://another.host$1 [P,NS] ...in other words, if a resource cannot be found on the local server, reverse-proxy the request to another.host. This incantation, of course, won't work as expected, because mod_rewrite evaluates RewriteCond *after* RewriteRule, hence the need for a custom module. Moreover, whichever response handler is installed for the URI, even core, typically has to be run in order to discover whether or not it returns a 404, so even if that incantation *did* work properly, it still wouldn't produce the desired effect. What this means is that the requested resource has to be run end-to-end in a subrequest, and, in the event of a 404, its output discarded before the request is handed off to the proxy. That's all fine. I can do all that. However: a) dealing with request bodies and b) handling interactions with other parts of the system (specifically filters), opens up quite the can of worms. Running the subrequest locally will consume the input brigade. I'm counteracting that by installing a filter that tucks the input into a temporary file, and then another filter which replays the input back into the proxy request, if there is any of either. (Aside: it is worth noting that the proxy request itself must not be a subrequest, as mod_proxy_http discards subrequest bodies.) If, however, the subrequest response is successful, I need to be able to hang on to the response content and somehow promote it into main request, so the configured filters will run against it (I have filters that won't run except for main requests). What I don't know a lot about is how the I/O brigade system *itself* works. Am I right in understanding there's only one pair per connection? I should also mention this prototype is in mod_perl, which is more or less irrelevant to the problem, except for the fact that modperl_response_handler's priority is APR_HOOK_MIDDLE whereas proxy_handler is APR_HOOK_FIRST. About all that means is the main logic can't manifest as a response handler because by then the proxy handler will already have been run. (It's also undesirable to run this logic in a response handler because that will get in the way of any other response handler that's been configured.) A background writeup on why I want this thing to exist is here: http://doriantaylor.com/the-redesign-dissolved The code is here: https://github.com/doriantaylor/p5-apache2-condproxy/blob/master/lib/Apache2/CondProxy.pm I'd be grateful if anybody could either explain or point me in the direction of a definitive summary of how request phases, subrequests and the bucket brigades interact. Thanks, -- Dorian Taylor http://doriantaylor.com/