Re: mod_perl output filter and mod_proxy, mod_cache

Tim Watts Thu, 14 Jul 2011 22:54:13 -0700

Hi Andre,

Thanks for such a detailed reply:


On 14/07/11 21:07, André Warnier wrote:


Back to the main issue.

See this as just a bit more generic information, as to what/how you
could think of solving your problem, apart from the other suggestions
already submitted.

1) I am not sure about mod_perl I/O filters, because I never used them. (*)
But in order to (conditionally/unconditionally) insert/delete/modify
request/response headers, you can also write your own perl handler, and
by choosing the appropriate type of PerlHandler, you can have it run at
just about any point in the request/response cycle.

The real power of mod_perl (if you haven't yet discovered that aspect),
is that it allows you to insert your own code at just about any point of
the Apache request processing cycle, and to do just about anything you
want with any aspect of the request/response.
That includes "interfering" with anything that other, non-perl, Apache
modules do.

I've written auth handlers in mod_perl before - I did get the impressionthen the possibilities were extensive to do other things,

See the following page for a good overview of the Apache request
processing cycle, and what you can do with such PerlHandlers :
http://perl.apache.org/docs/2.0/user/handlers/intro.html#mod_perl_Handlers_Categories

You are probably more interested in the "HTTP Protocol" section. By
clicking on each item in that list, you get and explanation of /when/
that type of handle runs.
(It's also indirectly a very good introduction to how Apache itself works).

Such handlers are usually easy to write and configure, and the code to
play with HTTP headers is also quite simple, if you know what to put in
the header(s).


ah - that is very useful - I shall read that.

2) about mod_headers and mod_proxy playing together :
The trouble is that (contrarily to the mod_perl documentation above) it
is not usually clear at all in the Apache module's documentation, to
find out during which exact phase of the Apache request processing each
module runs.

But I seem to remember something in mod_headers about an "early"
attribute or parameter.
Maybe that tells you more of when it runs (or can run), compared to
mod_proxy.

Hmm - I did read the web page several times, must have missed that - Iwas nearly at the point of reading the source.

3) In the documentation of mod_proxy, there should be a possibility to
configure it inside of a <Location(Match)> section, instead of
"globally" (outside of any section).
That forces you to decide more finely which URLs should or should not be
proxied/forwarded to Tomcat, but it also (in my view) makes it more
evident to combine the proxying instruction with other modules, like
perl filters or handlers.

In effect, from Apache's point of view, mod_proxy must be the equivalent
of a "content-generating handler" (like a PerlResponseHandler), because
for Apache, passing a request to mod_proxy for processing is not much
different than passing it to any other internal response-generating
handler.
Apache in fact knows nothing of Tomcat. It passes a request to
mod_proxy, and expects the response (or an error status) back from
mod_proxy. It has no idea that behind mod_proxy is another server.


It is an interesting possibility that is also worth playing with,

Most of our servers are: redirect all to the proxy *except* a couple ofurl's which are either locally handled or sent to a different proxy.


This is quite typical:

RewriteEngine on
RewriteRule "^/media"  - [L] # Local
RewriteRule "^/django" - [L] # Local
# Otherwise proxy
RewriteRule "^/(.*)$" "http://tomcat.server:8180/webapp/$1"; [P,L]
ProxyPassReverse   / http://tomcat.server:8180/webapp
ProxyPassReverseCookiePath /webapp /

Previously, this had been done with ProxyPass directives, includingnegative ones. This did not work well with some Rewrite rules that werealso needed in some cases. So I tend to handle the whole thing with anordered list of rewrite rules like above, using the proxy flag to thosewhere required. It makes the ordering more obvious.

I have not yet tried a system of building the website with set sofLocation directives, which might be interesting - though I do useLocation sections to enforce redirects to SSL and requiringauthentication. Apache is like perl, more than one way to do it.


4) strictly according to the HTTP protocol, a "GET" request should be
"idempotent", which means (roughly) that running it twice or more should
always give the same answer.
Which in theory means that even if the GET request goes to a database,
the response should be cacheable under most circumstances.
Unfortunately, the practice is such that the GET request is much
overused, and it is not always that way.
But if caching the response creates problems, you can always tell your
application developers that it is their fault because they are misusing
the protocol..

(In really strict terms, a GET /could/ provide a different response; but
it should not modify the state of the server).


I do recall that.

5) despite what I am saying in (4), a GET response can very validly be
different from a previous GET response with the same URL (for example,
if in-between the data has been modified by a POST). So if you are
forcing headers on the responses, you should at least be a bit careful
not to do this indiscriminately.

That is also why I personally have a doubt about the effectiveness of
another caching proxy front-end like a couple were mentioned earlier. If
the Tomcat web applications themselves do not provide headers to
indicate whether their response can be cached or not, how is the
front-end going to determine that this response /is/ the same as a
previous one ?
It seems to me that such a determination would require elements that
such a proxy does not have, no ?

I agree - the tomcat apps *should* be declaring what is the correctcaching scenario. But they don't. So this is very much a work around.However, for any given case, the dev folk usually remember enough abouta project to say "the content of the database does not change, and GETswill be invariant as a result" (or not). It's on that basis I'm happy toproceed with a kludge, just to save my poor servers from melting(!).Well the servers are all VMs, so in more to stop old projects stealingresources that could be better used on new projects.

I feel I understand Cache-Control (vs Expires) a lot better since Ioptimised my own website with mod_cache on top of HTML::Mason/mod_perl(which do play nice) - and my Mason bits do send sensible Cache-Controllines. So I plan to give a small lunchtime seminar on that topic withsome demos of using Google's pagespeed firebug plugin (very useful forthis stuff).

The stupid thing is, it is probably trivial at design time to wedgeextra HTTP headers in (maybe JSP has a framework level TTL/expirescontrol - I don't know) but one has to know one *should* be doing it...


Now if you are still there, one more question :
Are we talking here of a configuration where one front-end Apache
front-ends for several Tomcats possibly on different machines ?
or does each Tomcat have its own personal Apache front-end on the same
machine ?
or something in-between ?

Mix. Older projects sent 3 different VHOSTS to 3 different remote tomcatservers, each of which was handling a dozen+ webapps for a dozen+different apache servers.

This was a disaster as one bad webapp could take out the tomcat farm andthe bloody logs are so useless it was impossible to find out which one.

These days, we have 3 different tomcat instances on the front machine(dev, staging, live/production) and one apache with 3 VHOSTs mapping toeach tomcat. We may also blend in some django on the same machine.Apache may mix in static content itself for efficiciency (CSS/JS).

At least then, the development tomcat can be killed and restartedwithout breaking the live one (and no, "touching" the web.xml file totrigger a single webapp reload is about reliable as asking a robber todrop your cash off at the bank!).

They used to use a lot of perl - but I think perl lost it a bit withforms handling and Ajax (until recently perhaps) which is why everyonewent off playing with jsp and now django.

I must admit django does seem well designed and I object to python a lotless than java. Disadvantage - django likes to write your SQL for youleading to a lack of thinking there - eg, one I caught the other day:

5 JOINs with a SELECT DISTINCT over all. Bloke wondered why the MySQLserver took 40 seconds to compute the result!


(*) considering the name of "filter" however, I would think that
- an "input filter" should always run /before/ any module which
generates content (of which mod_proxy is one)
- an "output filter" should always run /after/ any modules which
generate content.
So, it is probably difficult to have a filter which runs /in-between/
other Apache modules.

I'm still going to have a look at mod_perl filters - I have a feelingthey could be useful here and there.


Thanks :)

Tim

--
Tim Watts
Personal Blog: http://www.dionic.net/tim/

Re: mod_perl output filter and mod_proxy, mod_cache

Reply via email to