Wanted: Orientation on interaction between filters and subrequests

2013-07-22 Thread dorian taylor
Hello,

Apologies in advance for the potential repeat; I initially posted to
modules-dev but realized this is more of an internals problem.

I'm trying to write a module that does what one would intuitively
expect from the following mod_rewrite incantation:

RewriteCond %{REQUEST_URI} !-U
RewriteRule (.*) http://another.host$1 [P,NS]

In other words, If the resource (powered by an arbitrary back-end)
can't be found on this server, reverse-proxy the request to
another.host. —which due to its design, can't be done by mod_rewrite.
What I want this functionality for is a sort of scaffolding through
which a website can be replaced incrementally, resource by resource,
without having to know anything about the middleware(s) of either the
old or the new server (see
http://doriantaylor.com/the-redesign-dissolved for context).

I'm hoping somebody can spell out for me, or point me in the direction
of resources on how to better understand the design of the bucket
brigade I/O and how it interacts with subrequests.

Here is my strategy so far:

1) Start with a fixup handler rigged to run only on main requests
2) The fixup handler performs a subrequest on the same URI as the main request
3) If the lookup response is 404, set the response handler to
proxy-handler and r-filename to proxy:... and return OK
4) Otherwise, attach an output filter to the subrequest that (somehow)
blocks writes to the network and run it
5) Repeat step 3.
6) If we're still here, set the response handler to a dummy handler
with its own output filter that unblocks the output and return OK.

(Note, for simplicity's sake I am leaving out dealing with request
bodies for the moment, but the plan is tentatively to 'tee' them to a
temporary file and then replay them into the proxy request if
applicable.)

(I should also note that unless there is a particularly compelling
alternative I have overlooked, I have good reason to be confident that
this conditional reverse proxy should manifest as an Apache module.)

Where I'm fuzzy is the somehow of pausing the output to the
network from the subrequest and resuming it in the response handler in
the main request. I have a prototype which currently runs the local
request twice: first in the subrequest which I discard, and again
(notwithstanding being diverted to the reverse proxy) in the main
request which I leave untouched (recall, I am performing this
operation in the fixup phase). Inefficiency aside, this solution is
inadequate because of its potential to corrupt application state in
dynamic resources. And of course, the only reason why I have to run
the subrequest at all, is because it appears the 404 status is more
often than not set by a module's response handler.

(I am also not clear about what happens under the hood with respect to
the subrequest's header set/protocol data, and whether or I have to
manually pull it up into the main request.)

I considered putting an EOS bucket at the front of the subrequest's
output brigade, and then popping it off in an output filter in the
main request, but I'm not sure how nasty a hack that would be and/or
what side effects might arise from a trick like that. Alternatively, I
suppose I could write the subrequest's content to another temporary
file and then just replay it in the response handler, but I'd prefer
to avoid creating any unnecessary I/O.

Thanks in advance for any orientation, existing code that behaves
similarly, or any other advice anybody can share.

Regards,

--
Dorian Taylor
http://doriantaylor.com/


Re: Need orientation around bucket brigades, subrequests, etc.

2013-07-21 Thread dorian taylor
On Sat, Jul 20, 2013 at 6:41 AM, Vincenzo D'Amore v.dam...@gmail.com wrote:

 I have implemented just same thing using rewritemap and a little of PHP.
 If you are interested to evaluate a different approach take a look at this:

 http://stackoverflow.com/questions/13030557/apache-just-rewrite-if-external-ressource-exists/16647345#16647345

Thanks, though unfortunately I'm fairly certain the only way to
reliably support request bodies, as well as not call local requests
twice, is to do some acrobatics with filters. I'd be grateful for any
light you could shed on those issues.

--
Dorian Taylor
http://doriantaylor.com/


Need orientation around bucket brigades, subrequests, etc.

2013-07-19 Thread dorian taylor
Hello,

I am currently prototyping a module that does the equivalent of this
mod_rewrite incantation:

RewriteCond %{REQUEST_URI} !-U
RewriteRule (.*) http://another.host$1 [P,NS]

...in other words, if a resource cannot be found on the local server,
reverse-proxy the request to another.host. This incantation, of
course, won't work as expected, because mod_rewrite evaluates
RewriteCond *after* RewriteRule, hence the need for a custom module.
Moreover, whichever response handler is installed for the URI, even
core, typically has to be run in order to discover whether or not it
returns a 404, so even if that incantation *did* work properly, it
still wouldn't produce the desired effect.

What this means is that the requested resource has to be run
end-to-end in a subrequest, and, in the event of a 404, its output
discarded before the request is handed off to the proxy.

That's all fine. I can do all that. However: a) dealing with request
bodies and b) handling interactions with other parts of the system
(specifically filters), opens up quite the can of worms.

Running the subrequest locally will consume the input brigade. I'm
counteracting that by installing a filter that tucks the input into a
temporary file, and then another filter which replays the input back
into the proxy request, if there is any of either. (Aside: it is worth
noting that the proxy request itself must not be a subrequest, as
mod_proxy_http discards subrequest bodies.)

If, however, the subrequest response is successful, I need to be able
to hang on to the response content and somehow promote it into main
request, so the configured filters will run against it (I have filters
that won't run except for main requests).

What I don't know a lot about is how the I/O brigade system *itself*
works. Am I right in understanding there's only one pair per
connection?

I should also mention this prototype is in mod_perl, which is more or
less irrelevant to the problem, except for the fact that
modperl_response_handler's priority is APR_HOOK_MIDDLE whereas
proxy_handler is APR_HOOK_FIRST. About all that means is the main
logic can't manifest as a response handler because by then the proxy
handler will already have been run. (It's also undesirable to run this
logic in a response handler because that will get in the way of any
other response handler that's been configured.)

A background writeup on why I want this thing to exist is here:
http://doriantaylor.com/the-redesign-dissolved
The code is here:
https://github.com/doriantaylor/p5-apache2-condproxy/blob/master/lib/Apache2/CondProxy.pm

I'd be grateful if anybody could either explain or point me in the
direction of a definitive summary of how request phases, subrequests
and the bucket brigades interact.

Thanks,

--
Dorian Taylor
http://doriantaylor.com/