"Randal L. Schwartz" wrote:
>
> OK, I went one better. I now have CPU-percentage-based throttling.
> The big problem on my site was not bandwidth, but how quickly the
> loadav would go up when I got hammered by things like "Teleport Pro".
> Hey, if you haven't seen that, go see it. Be afraid. Be Very Afraid.
> See <URL:http://www.tenmax.com/teleport/pro/home.htm> -- and think
> about what happens when something like that slams every one of your
> Mason or Embperl pages for your shopping cart catalog.
>
> So, I modified my throttler to look at the recent CPU usage over a
> window for a given IP. If the percentage exceeds a threshold, BOOM
> they get a 503 error and a correct "Retry-After:" to tell them how
> long they're banned.
That's a nifty module. I suggest that you alter your threshold
slightly. Instead of setting a fixed percentage of CPU time, you should
also consider the overall load. I know I wouldn't care if one IP was
taking up 5% CPU time if the overall load on the machine was less than
50% or so.
Also how does this IP-based tracking work in practice? People who are
behind corporate firewalls present dual problems: one person can map to
several IP addresses if the firewall uses a cluster of proxies, and one
IP can map to many people. A serious throttling effort would need to
take this into account.
Regards,
Jeffrey
> I do this in two parts:
>
> an accesshandler that starts a CPU timer ticking, sets up a
> cleanup handler to do most of the dirty work, then checks for a
> "currently blocked" condition, returning 503 if needed.
>
> a cleanup handler that notes the elapsed CPU, storing it into a
> file. Then, if not blocked already, counts the recent CPU usage,
> and starts or stops blocking based on that.
>
> Nice thing is: no file locking (everything works no matter how many
> people are doing things in parallel). Also, by pushing most of the
> logic down to the post-content phase, we keep the response time zippy!
>
> So, here's source. Peer review requested - I'm probably turning
> this in for my next WebTechniques column...
>
> package Stonehenge::Throttle;
> use strict;
>
> ## usage: PerlAccessHandler Stonehenge::Throttle
>
> my $HISTORYDIR = "/home/merlyn/lib/Apache/Throttle";
>
> my $WINDOW = 15; # seconds of interest
> my $DECLINE_CPU_PERCENT = 5; # CPU percent in window before we 503 error
>
> use vars qw($VERSION);
> $VERSION = (qw$Revision: 2.0 $ )[-1];
>
> use Apache::Constants qw(OK DECLINED);
> use Apache::File;
> use Apache::Log;
>
> use Stonehenge::Reload;
>
> sub handler {
> goto &handler if Stonehenge::Reload->reload_me;
>
> my $r = shift; # closure var
> return DECLINED unless $r->is_initial_req;
> my $log = $r->server->log; # closure var
>
> my $host = $r->get_remote_host; # closure var
> return DECLINED if $host =~ /\.(holdit|stonehenge)\.com$/;
> $host = "googlebot.com" if $host =~ /\.googlebot\.com$/;
>
> my $historyfile = "$HISTORYDIR/$host-times"; # closure var
> my $blockfile = "$HISTORYDIR/$host-blocked"; # closure var
> my @delta_times = times; # closure var
> my $fh = Apache::File->new; # closure var
>
> $r->register_cleanup
> (sub {
>
> ## record this CPU usage
> @delta_times = map { $_ - shift @delta_times } times;
> my $cpu_hundred = int 100*($delta_times[0] + $delta_times[1] + 0.01);
> ## $log->notice("throttle: $host got $cpu_hundred/100 in this slot"); #
>DEBUG
> open $fh, ">>$historyfile" or return DECLINED;
> my $time = time;
> syswrite $fh, pack "LL", $time, $cpu_hundred;
> close $fh;
>
> my $startwindow = $time - $WINDOW;
>
> if (my @stat = stat($blockfile)) {
> if ($stat[9] > $startwindow) {
> ## $log->notice("throttle: $blockfile is already blocking"); # DEBUG
> return OK; # nothing further to see... move along
> } else {
> ## $log->notice("throttle: $blockfile is old, ignoring"); # DEBUG
> }
> }
>
> # figure out if we should be blocking
> my $totalcpu = 0; # scaled by 100
>
> open $fh, $historyfile or return DECLINED;
> while ((read $fh, my $buf, 8) > 0) {
> my ($time, $cpu) = unpack "LL", $buf;
> next if $time < $startwindow;
> $totalcpu += $cpu;
> }
> close $fh;
>
> if ($totalcpu < $WINDOW * $DECLINE_CPU_PERCENT) {
> ## $log->notice("throttle: $host got $totalcpu/100 CPU in $WINDOW
>secs"); # DEBUG
> unlink $blockfile;
> return OK;
> }
>
> ## about to be nasty... let's see how bad it is:
> open $fh, "/proc/loadavg";
> chomp(my $loadavg = <$fh>);
> close $fh;
>
> my $useragent = $r->header_in('User-Agent') || "unknown";
>
> $log->notice("throttle: $host got $totalcpu/100 CPU in $WINDOW secs,
>enabling block [loadavg $loadavg, agent $useragent]");
> open $fh, ">$blockfile";
> close $fh;
>
> return OK;
> });
>
> ## back in the access handler:
>
> if (my @stat = stat($blockfile)) {
> if ($stat[9] > time - $WINDOW) {
> $log->warn("throttle access: $blockfile is blocking");
> $r->header_out("Retry-After", $WINDOW);
> return 503; # Service Unavailable
> } else {
> ## $log->notice("throttle access: $blockfile is old, ignoring"); # DEBUG
> return DECLINED;
> }
> }
>
> return DECLINED;
> }
> 1;
>
> --
> Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
> <[EMAIL PROTECTED]> <URL:http://www.stonehenge.com/merlyn/>
> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
> See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!