mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread Tim Watts

Hi,

Is it in theory possible to insert a perl output filter between 
mod_proxy and mod_cache?


Or at least between mod_proxy and the client?



The problem I'm trying to solve is this:

We have 100+ web servers where apache fronts a separate tomcat server 
using mod_proxy.


Sadly, the tomcat dev's forgot to set any caching headers in the HTTP 
response (either Expires, Last-Modified or Cache-control) so the sites 
are largely uncacheable by browsers and the various tomcats are becoming 
overloaded.


1/3 of our sites are typically invariant (the production sites have 
stable and unchanging data and most queries are via GET requests).


Therefore, the idea of forcing in some cache control headers en-route 
and also enabling some apache caching has a good chance of working well 
without affecting anything.


mod_headers and mod_proxy don't seem to play well together and mod-cache 
doesn't either (probably due to lack of cache control headers in the 
tomcat response, though I haven't proved this is actually the case).


So the thought of doing a perl based filter to insert cache-control 
headers occurred.


It is likely I can insert such a filter on Apache 2.2 *between* 
mod_proxy and mod_cache?


Or am I going to have to implement a filter that includes proxying 
and/or caching?


Many thanks for any advice,

Cheers,

Tim

--
Tim Watts
Personal Blog: http://www.dionic.net/tim/


Re: mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread André Warnier

Tim Watts wrote:

Hi,

Is it in theory possible to insert a perl output filter between 
mod_proxy and mod_cache?


Or at least between mod_proxy and the client?



The problem I'm trying to solve is this:

We have 100+ web servers where apache fronts a separate tomcat server 
using mod_proxy.


Sadly, the tomcat dev's forgot to set any caching headers in the HTTP 
response (either Expires, Last-Modified or Cache-control) so the sites 
are largely uncacheable by browsers and the various tomcats are becoming 
overloaded.


1/3 of our sites are typically invariant (the production sites have 
stable and unchanging data and most queries are via GET requests).


Therefore, the idea of forcing in some cache control headers en-route 
and also enabling some apache caching has a good chance of working well 
without affecting anything.


mod_headers and mod_proxy don't seem to play well together and mod-cache 
doesn't either (probably due to lack of cache control headers in the 
tomcat response, though I haven't proved this is actually the case).


So the thought of doing a perl based filter to insert cache-control 
headers occurred.


It is likely I can insert such a filter on Apache 2.2 *between* 
mod_proxy and mod_cache?


Or am I going to have to implement a filter that includes proxying 
and/or caching?
 

(That would probably be difficult, inefficient or both)

Assuming that what you say about Tomcat is true (I don't know, and it may be worth asking 
this on the Tomcat list), I can think of another way to achieve what you seem to want :
if you can distinguish, from the request URL (or any other request property), the requests 
that are for invariant things, then you could arrange to /not/ proxy these requests to 
Tomcat, and serve them directly from Apache httpd.


Which proxying method exactly are you using between Apache and Tomcat ? (if you are using 
mod_proxy, then you are either using mod_proxy_http or mod_proxy_ajp; you could also 
consider using mod_jk).


Also, what are the versions of Apache and Tomcat that you are using ?



Re: mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread Tim Watts

On 14/07/11 11:16, André Warnier wrote:

Hi Andre,

Thanks for the quick reply :)


(That would probably be difficult, inefficient or both)

Assuming that what you say about Tomcat is true (I don't know, and it
may be worth asking this on the Tomcat list), I can think of another way
to achieve what you seem to want :
if you can distinguish, from the request URL (or any other request
property), the requests that are for invariant things, then you could
arrange to /not/ proxy these requests to Tomcat, and serve them directly
from Apache httpd.


Indeed that is a good idea. We are doing that for new projects for css 
and js files (apache does not proxy certain paths and picks these up 
from the local filesystem).


We can't do that for the 100 odd legacy servers as no-one has time o 
delve into the java/JSP code. I need to do something outside of tomcat 
where possible. Just to explain, each web server is a paid-for project - 
and when it's done, it sits there for 5+ years.


Only I have the time/inclination to fix this as it's killing my VMWare 
infrastructure. Because the sites are all fronted by apache in a similar 
way, one solution is likely to apply to most of the sites.


I would also add that most of the sites are dynamically driven pages, 
even involving MySQL querying, but once launched, the data remains 
fairly static - eg GET X will always resolve to reponse Y.


I'm planning a small seminar on the value of Cache-Control for my dev 
colleagues so they can stop making this mistake ;- But that still 
leaves a lot of done projects to fix.



Which proxying method exactly are you using between Apache and Tomcat ?
(if you are using mod_proxy, then you are either using mod_proxy_http or
mod_proxy_ajp; you could also consider using mod_jk).


mod_proxy_http specifically.

mod_jk looks interesting for new projects (we have local tomcats for 
those now) - I think it may be a non-starter for old stuff as trying to 
retro fit it may not be so simple (our older tomcat servers are in a 
remote farm on their own machines hence the use of mod_proxy_http).



Also, what are the versions of Apache and Tomcat that you are using ?



Apache 2.2 (various sub versions) and both tomcat 5.5 and tomcat 6 (but 
all on remote machines listening on TCP sockets).


I think for this problem, I have to treat tomcat as a little, rather 
inefficient, black box and try to fixup on the apache front ends, hence 
the direction of my original idea...


Cheers,

Tim

--
Tim Watts
Personal Blog: http://www.dionic.net/tim/


Re: mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread Alex J. G. Burzyński
Hi Tim,

If you are after caching the responses, maybe an easier solution would
be to use a reverse proxy - like Varnish?

You would be then in complete control over the incoming and outgoing
headers and could cache responses based on the url / inject Expires
headers so browsers could cache them too etc.

Cheers,
Alex


On 14/07/11 11:39, Tim Watts wrote:
 On 14/07/11 11:16, André Warnier wrote:

 Hi Andre,

 Thanks for the quick reply :)

 (That would probably be difficult, inefficient or both)

 Assuming that what you say about Tomcat is true (I don't know, and it
 may be worth asking this on the Tomcat list), I can think of another way
 to achieve what you seem to want :
 if you can distinguish, from the request URL (or any other request
 property), the requests that are for invariant things, then you could
 arrange to /not/ proxy these requests to Tomcat, and serve them directly
 from Apache httpd.

 Indeed that is a good idea. We are doing that for new projects for css
 and js files (apache does not proxy certain paths and picks these up
 from the local filesystem).

 We can't do that for the 100 odd legacy servers as no-one has time o
 delve into the java/JSP code. I need to do something outside of
 tomcat where possible. Just to explain, each web server is a paid-for
 project - and when it's done, it sits there for 5+ years.

 Only I have the time/inclination to fix this as it's killing my VMWare
 infrastructure. Because the sites are all fronted by apache in a
 similar way, one solution is likely to apply to most of the sites.

 I would also add that most of the sites are dynamically driven
 pages, even involving MySQL querying, but once launched, the data
 remains fairly static - eg GET X will always resolve to reponse Y.

 I'm planning a small seminar on the value of Cache-Control for my dev
 colleagues so they can stop making this mistake ;- But that still
 leaves a lot of done projects to fix.

 Which proxying method exactly are you using between Apache and Tomcat ?
 (if you are using mod_proxy, then you are either using mod_proxy_http or
 mod_proxy_ajp; you could also consider using mod_jk).

 mod_proxy_http specifically.

 mod_jk looks interesting for new projects (we have local tomcats for
 those now) - I think it may be a non-starter for old stuff as trying
 to retro fit it may not be so simple (our older tomcat servers are in
 a remote farm on their own machines hence the use of mod_proxy_http).

 Also, what are the versions of Apache and Tomcat that you are using ?


 Apache 2.2 (various sub versions) and both tomcat 5.5 and tomcat 6
 (but all on remote machines listening on TCP sockets).

 I think for this problem, I have to treat tomcat as a little, rather
 inefficient, black box and try to fixup on the apache front ends,
 hence the direction of my original idea...

 Cheers,

 Tim




Re: mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread Tim Watts

On 14/07/11 11:52, Alex J. G. Burzyński wrote:

Hi Tim,

If you are after caching the responses, maybe an easier solution would
be to use a reverse proxy - like Varnish?

You would be then in complete control over the incoming and outgoing
headers and could cache responses based on the url / inject Expires
headers so browsers could cache them too etc.

Cheers,
Alex



[Sorry Alex, hit reply instead of reply-list]

Hi Alex,

I was initially also thinking Squid - but it's rather heavy.

I have not come across Varnish but having a quick look (and noting it is 
available on Debian - good) it looks like a damn good option.


I think you are right - apache is great, but the order of execution of 
modules is not well documented and prone to changing (hence my original 
question here) and trying to splice effectively 3 filters together 
(proxy, header-fiddling and cache) is probably doomed to grief.


Thanks for the tip - I'm off to try that today!

All the best,

Tim

--
Tim Watts
Personal Blog: http://www.dionic.net/tim/


Re: mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread André Warnier

Hi.

I have to apologise.
I misunderstood your first post, and I wanted to verify on the Tomcat list, so I quoted 
the following passage of your first post in my message there :


Sadly, the tomcat dev's forgot to set any caching headers in the HTTP response (either 
Expires, Last-Modified or Cache-control) so the sites are largely uncacheable by browsers 
and the various tomcats are becoming overloaded.


Unfortunately, the Tomcat Dev's there took it rather seriously, and as a consequence now 
you name is shit on the Tomcat list.



.. just kidding, I did not quote your name.

Anyway, apart from a few huffed responses to my misquote (since then rectified), someone 
provided a suggestion that may not be the simplest, but might be helpful anyway in some 
cases :


Have a look at : http://www.tuckey.org/urlrewrite/

This is a Java Servlet Filter, which can be added transparently around any Tomcat web 
application (by adding the required section in the web.xml config file of that web 
application).
Java Servlet Filters are such that the Tomcat web application is not even aware that it is 
there, and continues to work as before.  Much like Apache input and output filters in 
fact, except that a Java Servlet Filter is both at the same time (it wraps the webapp on 
both sides).


Anyway, this filter can do such things as conditionally or not adding response headers to 
anything the webapp produces.  And it can do much more, as with time it has evolved into 
some kind of mish-mash of mod_rewrite, mod_headers and mod_proxy.


It is more one-by-one work than doing something at the Apache front-end level or via a 
proxy, but it also provides better fine-tuning possibilities.

So, if you can for instance easily identify the worst offenders, it might be an 
option.

And it is certainly a good tool to have in one's toolcase.





Re: mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread Tim Watts

On 14/07/11 12:43, André Warnier wrote:

Hi.

I have to apologise.
I misunderstood your first post, and I wanted to verify on the Tomcat
list, so I quoted the following passage of your first post in my message
there :

Sadly, the tomcat dev's forgot to set any caching headers in the HTTP
response (either Expires, Last-Modified or Cache-control) so the sites
are largely uncacheable by browsers and the various tomcats are becoming
overloaded.

Unfortunately, the Tomcat Dev's there took it rather seriously, and as a
consequence now you name is shit on the Tomcat list.


.. just kidding, I did not quote your name.


LoL - I hate tomcat anyway (for it's fatness) so I don't mind if they 
hate me ;-


I should have clarified as my Department's dev team (ie the ones who 
use tomcat here) rather than the Tomcat Developers themselves...


I have no doubts that jsp can be told to emit certain headers but for 
some reason a lot of web developers IME often miss the finer points of 
HTTP. This of course would be the correct place to do it as they can 
choose different max-age times to suit the content.


I plan to run a 20 minute seminar on this specific point for my lot (and 
more such seminars for other issues like security and SQL efficiency) 
but that still leaves loads of old black-boxes to manage for a few years.



Anyway, apart from a few huffed responses to my misquote (since then
rectified), someone provided a suggestion that may not be the simplest,
but might be helpful anyway in some cases :

Have a look at : http://www.tuckey.org/urlrewrite/

This is a Java Servlet Filter, which can be added transparently
around any Tomcat web application (by adding the required section in
the web.xml config file of that web application).
Java Servlet Filters are such that the Tomcat web application is not
even aware that it is there, and continues to work as before. Much like
Apache input and output filters in fact, except that a Java Servlet
Filter is both at the same time (it wraps the webapp on both sides).


That could be interesting too - as long as it's something I can bolt in 
without having to recompile the webapp code, I'm game. As a linux 
sysadmin, I draw a clear line between the systems (my problem) and the 
apps (dev team) - and not knowing java (much) I'm not qualified to mess 
with their stuff... I'm happy to go as far as messing with server.xml 
and web.xml though :)



Anyway, this filter can do such things as conditionally or not adding
response headers to anything the webapp produces. And it can do much
more, as with time it has evolved into some kind of mish-mash of
mod_rewrite, mod_headers and mod_proxy.

It is more one-by-one work than doing something at the Apache front-end
level or via a proxy, but it also provides better fine-tuning
possibilities.
So, if you can for instance easily identify the worst offenders, it
might be an option.

And it is certainly a good tool to have in one's toolcase.


I agree - I'll have a look at that after I play with Alex's suggestion 
of Varnish :)


Thanks very much for your time :)

all the best,

Tim

--
Tim Watts
Personal Blog: http://www.dionic.net/tim/


Re: mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread James Smith

On 14/07/2011 11:39, Tim Watts wrote:

On 14/07/11 11:16, André Warnier wrote:

Hi Andre,

Thanks for the quick reply :)


(That would probably be difficult, inefficient or both)

Assuming that what you say about Tomcat is true (I don't know, and it
may be worth asking this on the Tomcat list), I can think of another way
to achieve what you seem to want :
if you can distinguish, from the request URL (or any other request
property), the requests that are for invariant things, then you could
arrange to /not/ proxy these requests to Tomcat, and serve them directly
from Apache httpd.


Indeed that is a good idea. We are doing that for new projects for css 
and js files (apache does not proxy certain paths and picks these up 
from the local filesystem).


We can't do that for the 100 odd legacy servers as no-one has time o 
delve into the java/JSP code. I need to do something outside of 
tomcat where possible. Just to explain, each web server is a paid-for 
project - and when it's done, it sits there for 5+ years.


Only I have the time/inclination to fix this as it's killing my VMWare 
infrastructure. Because the sites are all fronted by apache in a 
similar way, one solution is likely to apply to most of the sites.


I would also add that most of the sites are dynamically driven 
pages, even involving MySQL querying, but once launched, the data 
remains fairly static - eg GET X will always resolve to reponse Y.


I'm planning a small seminar on the value of Cache-Control for my dev 
colleagues so they can stop making this mistake ;- But that still 
leaves a lot of done projects to fix.



Which proxying method exactly are you using between Apache and Tomcat ?
(if you are using mod_proxy, then you are either using mod_proxy_http or
mod_proxy_ajp; you could also consider using mod_jk).


mod_proxy_http specifically.

mod_jk looks interesting for new projects (we have local tomcats for 
those now) - I think it may be a non-starter for old stuff as trying 
to retro fit it may not be so simple (our older tomcat servers are in 
a remote farm on their own machines hence the use of mod_proxy_http).


Shouldn't be an issue you can point the mod_jk to a remote machine - I 
do it a lot so that we can push the Tomcat application out through our 
templating output filter ... The tomcat produces a plain HTML page with 
none of the styling, and this is wrapped using our custom output filter, 
I'm guessing at this stage you can do what you want with the script...


James


Also, what are the versions of Apache and Tomcat that you are using ?



Apache 2.2 (various sub versions) and both tomcat 5.5 and tomcat 6 
(but all on remote machines listening on TCP sockets).


I think for this problem, I have to treat tomcat as a little, rather 
inefficient, black box and try to fixup on the apache front ends, 
hence the direction of my original idea...


Cheers,

Tim





--
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE. 


RE: mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread James B. Muir
I had to bolt on an input servlet filter to tomcat once. To do this I had to 
write the servlet filter code and then add filter and filter-mapping tags 
to the application WEB-INF/web.xml file.
-James


-Original Message-
From: Tim Watts [mailto:t...@dionic.net]
Sent: Thursday, July 14, 2011 8:12 AM
To: mod_perl list
Subject: Re: mod_perl output filter and mod_proxy, mod_cache

On 14/07/11 12:43, André Warnier wrote:
 Hi.

 I have to apologise.
 I misunderstood your first post, and I wanted to verify on the Tomcat
 list, so I quoted the following passage of your first post in my message
 there :

 Sadly, the tomcat dev's forgot to set any caching headers in the HTTP
 response (either Expires, Last-Modified or Cache-control) so the sites
 are largely uncacheable by browsers and the various tomcats are becoming
 overloaded.

 Unfortunately, the Tomcat Dev's there took it rather seriously, and as a
 consequence now you name is shit on the Tomcat list.


 .. just kidding, I did not quote your name.

LoL - I hate tomcat anyway (for it's fatness) so I don't mind if they
hate me ;-

I should have clarified as my Department's dev team (ie the ones who
use tomcat here) rather than the Tomcat Developers themselves...

I have no doubts that jsp can be told to emit certain headers but for
some reason a lot of web developers IME often miss the finer points of
HTTP. This of course would be the correct place to do it as they can
choose different max-age times to suit the content.

I plan to run a 20 minute seminar on this specific point for my lot (and
more such seminars for other issues like security and SQL efficiency)
but that still leaves loads of old black-boxes to manage for a few years.

 Anyway, apart from a few huffed responses to my misquote (since then
 rectified), someone provided a suggestion that may not be the simplest,
 but might be helpful anyway in some cases :

 Have a look at : http://www.tuckey.org/urlrewrite/

 This is a Java Servlet Filter, which can be added transparently
 around any Tomcat web application (by adding the required section in
 the web.xml config file of that web application).
 Java Servlet Filters are such that the Tomcat web application is not
 even aware that it is there, and continues to work as before. Much like
 Apache input and output filters in fact, except that a Java Servlet
 Filter is both at the same time (it wraps the webapp on both sides).

That could be interesting too - as long as it's something I can bolt in
without having to recompile the webapp code, I'm game. As a linux
sysadmin, I draw a clear line between the systems (my problem) and the
apps (dev team) - and not knowing java (much) I'm not qualified to mess
with their stuff... I'm happy to go as far as messing with server.xml
and web.xml though :)

 Anyway, this filter can do such things as conditionally or not adding
 response headers to anything the webapp produces. And it can do much
 more, as with time it has evolved into some kind of mish-mash of
 mod_rewrite, mod_headers and mod_proxy.

 It is more one-by-one work than doing something at the Apache front-end
 level or via a proxy, but it also provides better fine-tuning
 possibilities.
 So, if you can for instance easily identify the worst offenders, it
 might be an option.

 And it is certainly a good tool to have in one's toolcase.

I agree - I'll have a look at that after I play with Alex's suggestion
of Varnish :)

Thanks very much for your time :)

all the best,

Tim

--
Tim Watts
Personal Blog: http://www.dionic.net/tim/

IMPORTANT NOTICE REGARDING THIS ELECTRONIC MESSAGE:

This message is intended for the use of the person to whom it is addressed and 
may contain information that is privileged, confidential, and protected from 
disclosure under applicable law.  If you are not the intended recipient, your 
use of this message for any purpose is strictly prohibited.  If you have 
received this communication in error, please delete the message and notify the 
sender so that we may correct our records.


Re: mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread André Warnier

Tim Watts wrote:
...



LoL - I hate tomcat anyway (for it's fatness) so I don't mind if they 
hate me ;-


I should have clarified as my Department's dev team (ie the ones who 
use tomcat here) rather than the Tomcat Developers themselves...


Well, I said that too, and said I had misquoted you, but there was little I could do about 
 that next phrase of yours :


I think for this problem, I have to treat tomcat as a little, rather inefficient, black 
box ..




Re: mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread Tim Watts

On 14/07/11 14:38, André Warnier wrote:

Tim Watts wrote:
...



I think for this problem, I have to treat tomcat as a little, rather
inefficient, black box ..



They liked that quote then? ;-

OT Rant

I'm sure it's a lovely development environment (there must be some 
reason people use it) - all I know is it's a resource hungry bitch 
that's never happy unless it has GB's RAM and at least 2, preferably 4 
fast cores. And if you p*ss it off, it will eat your swap and burn all 
your cores at 100%. Bane of my sysadmin life...


Don't get me started on the readability of its log files!!

That's across a wide range of applications including commercial stuff 
like Confluence.


Bah - give me mod_perl (or even mod_wsgi+python) anyday...

I've got a lot done with HTML::Mason+mod_perl and very efficiently (for 
such a  simple templating system) and I've considering Mojolicious for 
fun. Learning django too right now too for the cool forms+DB stuff.


Thankfully, our guys are making a switch to django away from tomcat and 
it is so much nicer to manage.


Cheers,

Tim

--
Tim Watts
Personal Blog: http://www.dionic.net/tim/


Re [OT]: mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread André Warnier
I'll have to watch my language here, as I might otherwise get ostracised on that other 
list of mine.


Tim Watts wrote:

On 14/07/11 14:38, André Warnier wrote:

Tim Watts wrote:
...



I think for this problem, I have to treat tomcat as a little, rather
inefficient, black box ..



They liked that quote then? ;-

OT Rant

I'm sure it's a lovely development environment (there must be some 
reason people use it) - all I know is it's a resource hungry bitch 
that's never happy unless it has GB's RAM and at least 2, preferably 4 
fast cores. And if you p*ss it off, it will eat your swap and burn all 
your cores at 100%. Bane of my sysadmin life...


We should start a club.



Don't get me started on the readability of its log files!!


Or worse, the logging configuration.



That's across a wide range of applications including commercial stuff 
like Confluence.


Bah - give me mod_perl (or even mod_wsgi+python) anyday...


+1



I've got a lot done with HTML::Mason+mod_perl and very efficiently (for 
such a  simple templating system) and I've considering Mojolicious for 
fun. Learning django too right now too for the cool forms+DB stuff.




We have been re-developing stuff that is based on , using mod_perl and TT2 
for now.
It works faster, uses umpteen MB less memory, and may soon deliver us from the management 
of that -based stuff too.



Thankfully, our guys are making a switch to django away from  and 
it is so much nicer to manage.



Don't know it, but will have a look.

[OT, ADVOCACY]

I am partial to perl and CPAN, because there are just so many things I have been able to 
do with them over the years at little expense to solve real-world problems.
And despite the fact that I also use a lot of OO modules in perl, I just cannot get in 
sympathy with a language like *, where it seems that you have to mobilise a couple of 
dozen classes (and x MB of RAM) just to print a date or so.

Never mind the time spent trying to find their documentations.

As a matter of fact, when I am confronted with a new kind of problem, in an area where I 
know a-priori nothing, my first stop is usually not Google nor Wikipedia but CPAN, just to 
read the documentation of the modules related to that area.  Whether you need to parse 
text, to process some weird data format, to talk to Amazon, to make credit-card payments, 
to dig out and generate system statistics, to understand how SOAP works, to drive an 
MS-Office program through OLE (and know nothing of OLE to start with), create a TCP 
server, convert images, read or create and send emails, or whatever, you always find an 
answer there. Even if in the end it turns out that the answer is not something in perl, 
there is so much knowledge stored in CPAN that it is a pity that it is only consulted by 
perl-centric types.


[IDEA]
Maybe creating a website named WikiPerl, containing just the CPAN documentation with a 
decent search engine (KinoSearch/Lucy ?), would help restore perl's popularity ?


Or do we just keep that for ourselves, as the best job-preservation scheme ever 
designed ?


Ooops. I was just about to send this to the wrong list...


Re: Re [OT]: mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread Niels Larsen
Yes, CPAN has very, very useful things. I consider its biggest problems
1) too difficult to find things when not knowing what one wants, 2) a
huge undergrowth of modules that are either bad quality or unmaintained
or duplicated with a later module. The number of lingering bugs are an
obstacle, yet at the same time super-useful things are hiding in plain
view. 

Apropos, Perl Dancer was hiding for me because I didn't see it here,
http://search.cpan.org/modlist/World_Wide_Web .. but many more such 
discoveries in the past. A simple global ranking by popularity (the 
number of times downloaded) and/or by size and maturity (time located
on CPAN) would expose many new things to many, I think. If other 
modules depend on them, then that may speak to quality somewhat, and 
much better rating could be done. MongoDB would probably make managing
the collection easier. But, I am grateful for what exists of course.

While watching the language certainly, I'm moving from Apache/mod_perl
to Dancer/Nginx for speed and memory reason.

Ok, back to lurk-mode,

Niels Larsen


 [OT, ADVOCACY]
 
 I am partial to perl and CPAN, because there are just so many things I have 
 been able to 
 do with them over the years at little expense to solve real-world problems.
 And despite the fact that I also use a lot of OO modules in perl, I just 
 cannot get in 
 sympathy with a language like *, where it seems that you have to mobilise 
 a couple of 
 dozen classes (and x MB of RAM) just to print a date or so.
 Never mind the time spent trying to find their documentations.
 
 As a matter of fact, when I am confronted with a new kind of problem, in an 
 area where I 
 know a-priori nothing, my first stop is usually not Google nor Wikipedia but 
 CPAN, just to 
 read the documentation of the modules related to that area.  Whether you need 
 to parse 
 text, to process some weird data format, to talk to Amazon, to make 
 credit-card payments, 
 to dig out and generate system statistics, to understand how SOAP works, to 
 drive an 
 MS-Office program through OLE (and know nothing of OLE to start with), create 
 a TCP 
 server, convert images, read or create and send emails, or whatever, you 
 always find an 
 answer there. Even if in the end it turns out that the answer is not 
 something in perl, 
 there is so much knowledge stored in CPAN that it is a pity that it is only 
 consulted by 
 perl-centric types.
 
 [IDEA]
 Maybe creating a website named WikiPerl, containing just the CPAN 
 documentation with a 
 decent search engine (KinoSearch/Lucy ?), would help restore perl's 
 popularity ?
 
 Or do we just keep that for ourselves, as the best job-preservation scheme 
 ever designed ?
 
 
 Ooops. I was just about to send this to the wrong list...




Re: Re [OT]: mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread Clinton Gormley
Hi Niels

On Thu, 2011-07-14 at 20:09 +0200, Niels Larsen wrote:
 Yes, CPAN has very, very useful things. I consider its biggest problems
 1) too difficult to find things when not knowing what one wants, 2) a
 huge undergrowth of modules that are either bad quality or unmaintained
 or duplicated with a later module. The number of lingering bugs are an
 obstacle, yet at the same time super-useful things are hiding in plain
 view. 

Check out http://metacpan.org - it's a GSOC 2011 project that aims to
improve cpan search.  Tagging and user ranking (plus integration of
those into the search results) are next on the feature list

clint




Re: mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread André Warnier

Tim Watts wrote:

Hi,

Is it in theory possible to insert a perl output filter between 
mod_proxy and mod_cache?


Or at least between mod_proxy and the client?


...



mod_headers and mod_proxy don't seem to play well together and mod-cache 
doesn't either (probably due to lack of cache control headers in the 
tomcat response, though I haven't proved this is actually the case).



...

Back to the main issue.

See this as just a bit more generic information, as to what/how you could think of solving 
your problem, apart from the other suggestions already submitted.


1) I am not sure about mod_perl I/O filters, because I never used them. (*)
But in order to (conditionally/unconditionally) insert/delete/modify request/response 
headers, you can also write your own perl handler, and by choosing the appropriate type of 
 PerlHandler, you can have it run at just about any point in the request/response cycle.


The real power of mod_perl (if you haven't yet discovered that aspect), is that it allows 
you to insert your own code at just about any point of the Apache request processing 
cycle, and to do just about anything you want with any aspect of the request/response.

That includes interfering with anything that other, non-perl, Apache modules 
do.

See the following page for a good overview of the Apache request processing cycle, and 
what you can do with such PerlHandlers :

http://perl.apache.org/docs/2.0/user/handlers/intro.html#mod_perl_Handlers_Categories
You are probably more interested in the HTTP Protocol section.  By clicking on each item 
in that list, you get and explanation of /when/ that type of handle runs.

(It's also indirectly a very good introduction to how Apache itself works).

Such handlers are usually easy to write and configure, and the code to play with HTTP 
headers is also quite simple, if you know what to put in the header(s).


2) about mod_headers and mod_proxy playing together :
The trouble is that (contrarily to the mod_perl documentation above) it is not usually 
clear at all in the Apache module's documentation, to find out during which exact phase of 
the Apache request processing each module runs.


But I seem to remember something in mod_headers about an early attribute or 
parameter.
Maybe that tells you more of when it runs (or can run), compared to mod_proxy.

3) In the documentation of mod_proxy, there should be a possibility to configure it inside 
of a Location(Match) section, instead of globally (outside of any section).
That forces you to decide more finely which URLs should or should not be proxied/forwarded 
to Tomcat, but it also (in my view) makes it more evident to combine the proxying 
instruction with other modules, like perl filters or handlers.


In effect, from Apache's point of view, mod_proxy must be the equivalent of a 
content-generating handler (like a PerlResponseHandler), because for Apache, passing a 
request to mod_proxy for processing is not much different than passing it to any other 
internal response-generating handler.
Apache in fact knows nothing of Tomcat.  It passes a request to mod_proxy, and expects the 
response (or an error status) back from mod_proxy.  It has no idea that behind mod_proxy 
is another server.



4) strictly according to the HTTP protocol, a GET request should be idempotent, which 
means (roughly) that running it twice or more should always give the same answer.
Which in theory means that even if the GET request goes to a database, the response should 
be cacheable under most circumstances.
Unfortunately, the practice is such that the GET request is much overused, and it is not 
always that way.
But if caching the response creates problems, you can always tell your application 
developers that it is their fault because they are misusing the protocol..


(In really strict terms, a GET /could/ provide a different response; but it should not 
modify the state of the server).


5) despite what I am saying in (4), a GET response can very validly be different from a 
previous GET response with the same URL (for example, if in-between the data has been 
modified by a POST).  So if you are forcing headers on the responses, you should at least 
be a bit careful not to do this indiscriminately.


That is also why I personally have a doubt about the effectiveness of another caching 
proxy front-end like a couple were mentioned earlier.  If the Tomcat web applications 
themselves do not provide headers to indicate whether their response can be cached or not, 
how is the front-end going to determine that this response /is/ the same as a previous one ?
It seems to me that such a determination would require elements that such a proxy does not 
have, no ?



Now if you are still there, one more question :
Are we talking here of a configuration where one front-end Apache front-ends for several 
Tomcats possibly on different machines ?

or does each Tomcat have its own personal Apache front-end on the same machine ?
or something 

Re: mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread André Warnier

And here is another link which might be interesting.
It is a message on the Tomcat list (where I re-posted your original request, 
hem), from
Rainer Jung, who is one of the Apache/Tomcat mod_jk connector developers :


Yes, go for TC 7:

http://tomcat.apache.org/tomcat-7.0-doc/config/filter.html#Expires_Filter

Regards,

Rainer


Now that Tomcat page, apart from its own interest, also points to the Apache mod_expires 
module (which I never heard about before) in your case may be exactly what you're looking for.


It seems to be such that it can add headers in a response proxied to Tomcat, without 
overwriting such headers if they already exist.



Here is what I would do :

1) identify some usual suspects among the URLs proxied to Tomcat
   They would have to match the following criteria :
   - they happen on an overloaded Tomcat
   - they happen often
   - I am reasonably sure that the information delivered by that URL
 is stable over a period of time
   - I am reasonably sure that if it happened that the browser would,
 once in a while, get stale information, it would not be dramatic

2) carefully configure the front-end Apache to, for these particular URLs,
add an Expires header specifying now + N, where N is initially not too large.
This way, a browser would not get a result that is more than N outdated, but any duplicate 
request within a period N would get the cached version.


3) look at the impact and loop or not, increasing or decreasing N

YMMV.




Re: mod_perl output filter and mod_proxy, mod_cache

2011-07-14 Thread Tim Watts

Hi Andre,

Thanks for such a detailed reply:

On 14/07/11 21:07, André Warnier wrote:



Back to the main issue.

See this as just a bit more generic information, as to what/how you
could think of solving your problem, apart from the other suggestions
already submitted.

1) I am not sure about mod_perl I/O filters, because I never used them. (*)
But in order to (conditionally/unconditionally) insert/delete/modify
request/response headers, you can also write your own perl handler, and
by choosing the appropriate type of PerlHandler, you can have it run at
just about any point in the request/response cycle.

The real power of mod_perl (if you haven't yet discovered that aspect),
is that it allows you to insert your own code at just about any point of
the Apache request processing cycle, and to do just about anything you
want with any aspect of the request/response.
That includes interfering with anything that other, non-perl, Apache
modules do.


I've written auth handlers in mod_perl before - I did get the impression 
then the possibilities were extensive to do other things,



See the following page for a good overview of the Apache request
processing cycle, and what you can do with such PerlHandlers :
http://perl.apache.org/docs/2.0/user/handlers/intro.html#mod_perl_Handlers_Categories

You are probably more interested in the HTTP Protocol section. By
clicking on each item in that list, you get and explanation of /when/
that type of handle runs.
(It's also indirectly a very good introduction to how Apache itself works).

Such handlers are usually easy to write and configure, and the code to
play with HTTP headers is also quite simple, if you know what to put in
the header(s).


ah - that is very useful - I shall read that.


2) about mod_headers and mod_proxy playing together :
The trouble is that (contrarily to the mod_perl documentation above) it
is not usually clear at all in the Apache module's documentation, to
find out during which exact phase of the Apache request processing each
module runs.

But I seem to remember something in mod_headers about an early
attribute or parameter.
Maybe that tells you more of when it runs (or can run), compared to
mod_proxy.


Hmm - I did read the web page several times, must have missed that - I 
was nearly at the point of reading the source.



3) In the documentation of mod_proxy, there should be a possibility to
configure it inside of a Location(Match) section, instead of
globally (outside of any section).
That forces you to decide more finely which URLs should or should not be
proxied/forwarded to Tomcat, but it also (in my view) makes it more
evident to combine the proxying instruction with other modules, like
perl filters or handlers.

In effect, from Apache's point of view, mod_proxy must be the equivalent
of a content-generating handler (like a PerlResponseHandler), because
for Apache, passing a request to mod_proxy for processing is not much
different than passing it to any other internal response-generating
handler.
Apache in fact knows nothing of Tomcat. It passes a request to
mod_proxy, and expects the response (or an error status) back from
mod_proxy. It has no idea that behind mod_proxy is another server.


It is an interesting possibility that is also worth playing with,

Most of our servers are: redirect all to the proxy *except* a couple of 
url's which are either locally handled or sent to a different proxy.


This is quite typical:

RewriteEngine on
RewriteRule ^/media  - [L] # Local
RewriteRule ^/django - [L] # Local
# Otherwise proxy
RewriteRule ^/(.*)$ http://tomcat.server:8180/webapp/$1; [P,L]
ProxyPassReverse   / http://tomcat.server:8180/webapp
ProxyPassReverseCookiePath /webapp /


Previously, this had been done with ProxyPass directives, including 
negative ones. This did not work well with some Rewrite rules that were 
also needed in some cases. So I tend to handle the whole thing with an 
ordered list of rewrite rules like above, using the proxy flag to those 
where required. It makes the ordering more obvious.


I have not yet tried a system of building the website with set sof 
Location directives, which might be interesting - though I do use 
Location sections to enforce redirects to SSL and requiring 
authentication. Apache is like perl, more than one way to do it.




4) strictly according to the HTTP protocol, a GET request should be
idempotent, which means (roughly) that running it twice or more should
always give the same answer.
Which in theory means that even if the GET request goes to a database,
the response should be cacheable under most circumstances.
Unfortunately, the practice is such that the GET request is much
overused, and it is not always that way.
But if caching the response creates problems, you can always tell your
application developers that it is their fault because they are misusing
the protocol..

(In really strict terms, a GET /could/ provide a different response; but
it should not modify the state of the