Tim Watts wrote:
Hi,
Is it in theory possible to insert a perl output filter between
mod_proxy and mod_cache?
Or at least between mod_proxy and the client?
...
mod_headers and mod_proxy don't seem to play well together and mod-cache
doesn't either (probably due to lack of cache control headers in the
tomcat response, though I haven't proved this is actually the case).
...
Back to the main issue.
See this as just a bit more generic information, as to what/how you could think of solving
your problem, apart from the other suggestions already submitted.
1) I am not sure about mod_perl I/O filters, because I never used them. (*)
But in order to (conditionally/unconditionally) insert/delete/modify request/response
headers, you can also write your own perl handler, and by choosing the appropriate type of
PerlHandler, you can have it run at just about any point in the request/response cycle.
The real power of mod_perl (if you haven't yet discovered that aspect), is that it allows
you to insert your own code at just about any point of the Apache request processing
cycle, and to do just about anything you want with any aspect of the request/response.
That includes "interfering" with anything that other, non-perl, Apache modules
do.
See the following page for a good overview of the Apache request processing cycle, and
what you can do with such PerlHandlers :
http://perl.apache.org/docs/2.0/user/handlers/intro.html#mod_perl_Handlers_Categories
You are probably more interested in the "HTTP Protocol" section. By clicking on each item
in that list, you get and explanation of /when/ that type of handle runs.
(It's also indirectly a very good introduction to how Apache itself works).
Such handlers are usually easy to write and configure, and the code to play with HTTP
headers is also quite simple, if you know what to put in the header(s).
2) about mod_headers and mod_proxy playing together :
The trouble is that (contrarily to the mod_perl documentation above) it is not usually
clear at all in the Apache module's documentation, to find out during which exact phase of
the Apache request processing each module runs.
But I seem to remember something in mod_headers about an "early" attribute or
parameter.
Maybe that tells you more of when it runs (or can run), compared to mod_proxy.
3) In the documentation of mod_proxy, there should be a possibility to configure it inside
of a <Location(Match)> section, instead of "globally" (outside of any section).
That forces you to decide more finely which URLs should or should not be proxied/forwarded
to Tomcat, but it also (in my view) makes it more evident to combine the proxying
instruction with other modules, like perl filters or handlers.
In effect, from Apache's point of view, mod_proxy must be the equivalent of a
"content-generating handler" (like a PerlResponseHandler), because for Apache, passing a
request to mod_proxy for processing is not much different than passing it to any other
internal response-generating handler.
Apache in fact knows nothing of Tomcat. It passes a request to mod_proxy, and expects the
response (or an error status) back from mod_proxy. It has no idea that behind mod_proxy
is another server.
4) strictly according to the HTTP protocol, a "GET" request should be "idempotent", which
means (roughly) that running it twice or more should always give the same answer.
Which in theory means that even if the GET request goes to a database, the response should
be cacheable under most circumstances.
Unfortunately, the practice is such that the GET request is much overused, and it is not
always that way.
But if caching the response creates problems, you can always tell your application
developers that it is their fault because they are misusing the protocol..
(In really strict terms, a GET /could/ provide a different response; but it should not
modify the state of the server).
5) despite what I am saying in (4), a GET response can very validly be different from a
previous GET response with the same URL (for example, if in-between the data has been
modified by a POST). So if you are forcing headers on the responses, you should at least
be a bit careful not to do this indiscriminately.
That is also why I personally have a doubt about the effectiveness of another caching
proxy front-end like a couple were mentioned earlier. If the Tomcat web applications
themselves do not provide headers to indicate whether their response can be cached or not,
how is the front-end going to determine that this response /is/ the same as a previous one ?
It seems to me that such a determination would require elements that such a proxy does not
have, no ?
Now if you are still there, one more question :
Are we talking here of a configuration where one front-end Apache front-ends for several
Tomcats possibly on different machines ?
or does each Tomcat have its own personal Apache front-end on the same machine ?
or something in-between ?
(*) considering the name of "filter" however, I would think that
- an "input filter" should always run /before/ any module which generates content (of
which mod_proxy is one)
- an "output filter" should always run /after/ any modules which generate
content.
So, it is probably difficult to have a filter which runs /in-between/ other
Apache modules.