Tim Watts wrote:
Hi,

Is it in theory possible to insert a perl output filter between mod_proxy and mod_cache?

Or at least between mod_proxy and the client?

...


mod_headers and mod_proxy don't seem to play well together and mod-cache doesn't either (probably due to lack of cache control headers in the tomcat response, though I haven't proved this is actually the case).

...

Back to the main issue.

See this as just a bit more generic information, as to what/how you could think of solving your problem, apart from the other suggestions already submitted.

1) I am not sure about mod_perl I/O filters, because I never used them. (*)
But in order to (conditionally/unconditionally) insert/delete/modify request/response headers, you can also write your own perl handler, and by choosing the appropriate type of PerlHandler, you can have it run at just about any point in the request/response cycle.

The real power of mod_perl (if you haven't yet discovered that aspect), is that it allows you to insert your own code at just about any point of the Apache request processing cycle, and to do just about anything you want with any aspect of the request/response.
That includes "interfering" with anything that other, non-perl, Apache modules 
do.

See the following page for a good overview of the Apache request processing cycle, and what you can do with such PerlHandlers :
http://perl.apache.org/docs/2.0/user/handlers/intro.html#mod_perl_Handlers_Categories
You are probably more interested in the "HTTP Protocol" section. By clicking on each item in that list, you get and explanation of /when/ that type of handle runs.
(It's also indirectly a very good introduction to how Apache itself works).

Such handlers are usually easy to write and configure, and the code to play with HTTP headers is also quite simple, if you know what to put in the header(s).

2) about mod_headers and mod_proxy playing together :
The trouble is that (contrarily to the mod_perl documentation above) it is not usually clear at all in the Apache module's documentation, to find out during which exact phase of the Apache request processing each module runs.

But I seem to remember something in mod_headers about an "early" attribute or 
parameter.
Maybe that tells you more of when it runs (or can run), compared to mod_proxy.

3) In the documentation of mod_proxy, there should be a possibility to configure it inside of a <Location(Match)> section, instead of "globally" (outside of any section). That forces you to decide more finely which URLs should or should not be proxied/forwarded to Tomcat, but it also (in my view) makes it more evident to combine the proxying instruction with other modules, like perl filters or handlers.

In effect, from Apache's point of view, mod_proxy must be the equivalent of a "content-generating handler" (like a PerlResponseHandler), because for Apache, passing a request to mod_proxy for processing is not much different than passing it to any other internal response-generating handler. Apache in fact knows nothing of Tomcat. It passes a request to mod_proxy, and expects the response (or an error status) back from mod_proxy. It has no idea that behind mod_proxy is another server.


4) strictly according to the HTTP protocol, a "GET" request should be "idempotent", which means (roughly) that running it twice or more should always give the same answer. Which in theory means that even if the GET request goes to a database, the response should be cacheable under most circumstances. Unfortunately, the practice is such that the GET request is much overused, and it is not always that way. But if caching the response creates problems, you can always tell your application developers that it is their fault because they are misusing the protocol..

(In really strict terms, a GET /could/ provide a different response; but it should not modify the state of the server).

5) despite what I am saying in (4), a GET response can very validly be different from a previous GET response with the same URL (for example, if in-between the data has been modified by a POST). So if you are forcing headers on the responses, you should at least be a bit careful not to do this indiscriminately.

That is also why I personally have a doubt about the effectiveness of another caching proxy front-end like a couple were mentioned earlier. If the Tomcat web applications themselves do not provide headers to indicate whether their response can be cached or not, how is the front-end going to determine that this response /is/ the same as a previous one ? It seems to me that such a determination would require elements that such a proxy does not have, no ?


Now if you are still there, one more question :
Are we talking here of a configuration where one front-end Apache front-ends for several Tomcats possibly on different machines ?
or does each Tomcat have its own personal Apache front-end on the same machine ?
or something in-between ?


(*) considering the name of "filter" however, I would think that
- an "input filter" should always run /before/ any module which generates content (of which mod_proxy is one)
- an "output filter" should always run /after/ any modules which generate 
content.
So, it is probably difficult to have a filter which runs /in-between/ other 
Apache modules.

Reply via email to