"Randal L. Schwartz" wrote:
> 
> OK, I went one better.  I now have CPU-percentage-based throttling.
> The big problem on my site was not bandwidth, but how quickly the
> loadav would go up when I got hammered by things like "Teleport Pro".
> Hey, if you haven't seen that, go see it.  Be afraid.  Be Very Afraid.
> See <URL:http://www.tenmax.com/teleport/pro/home.htm> -- and think
> about what happens when something like that slams every one of your
> Mason or Embperl pages for your shopping cart catalog.
>
> So, I modified my throttler to look at the recent CPU usage over a
> window for a given IP.  If the percentage exceeds a threshold, BOOM
> they get a 503 error and a correct "Retry-After:" to tell them how
> long they're banned.

That's a nifty module.  I suggest that you alter your threshold
slightly.  Instead of setting a fixed percentage of CPU time, you should
also consider the overall load.  I know I wouldn't care if one IP was
taking up 5% CPU time if the overall load on the machine was less than
50% or so.

Also how does this IP-based tracking work in practice?  People who are
behind corporate firewalls present dual problems: one person can map to
several IP addresses if the firewall uses a cluster of proxies, and one
IP can map to many people.  A serious throttling effort would need to
take this into account.

Regards,
Jeffrey

> I do this in two parts:
> 
>     an accesshandler that starts a CPU timer ticking, sets up a
>     cleanup handler to do most of the dirty work, then checks for a
>     "currently blocked" condition, returning 503 if needed.
> 
>     a cleanup handler that notes the elapsed CPU, storing it into a
>     file.  Then, if not blocked already, counts the recent CPU usage,
>     and starts or stops blocking based on that.
> 
> Nice thing is: no file locking (everything works no matter how many
> people are doing things in parallel).  Also, by pushing most of the
> logic down to the post-content phase, we keep the response time zippy!
> 
> So, here's source.  Peer review requested - I'm probably turning
> this in for my next WebTechniques column...
> 
>     package Stonehenge::Throttle;
>     use strict;
> 
>     ## usage: PerlAccessHandler Stonehenge::Throttle
> 
>     my $HISTORYDIR = "/home/merlyn/lib/Apache/Throttle";
> 
>     my $WINDOW = 15;                # seconds of interest
>     my $DECLINE_CPU_PERCENT = 5; # CPU percent in window before we 503 error
> 
>     use vars qw($VERSION);
>     $VERSION = (qw$Revision: 2.0 $ )[-1];
> 
>     use Apache::Constants qw(OK DECLINED);
>     use Apache::File;
>     use Apache::Log;
> 
>     use Stonehenge::Reload;
> 
>     sub handler {
>       goto &handler if Stonehenge::Reload->reload_me;
> 
>       my $r = shift;                # closure var
>       return DECLINED unless $r->is_initial_req;
>       my $log = $r->server->log;    # closure var
> 
>       my $host = $r->get_remote_host; # closure var
>       return DECLINED if $host =~ /\.(holdit|stonehenge)\.com$/;
>       $host = "googlebot.com" if $host =~ /\.googlebot\.com$/;
> 
>       my $historyfile = "$HISTORYDIR/$host-times"; # closure var
>       my $blockfile = "$HISTORYDIR/$host-blocked"; # closure var
>       my @delta_times = times;      # closure var
>       my $fh = Apache::File->new;   # closure var
> 
>       $r->register_cleanup
>         (sub {
> 
>            ## record this CPU usage
>            @delta_times = map { $_ - shift @delta_times } times;
>            my $cpu_hundred = int 100*($delta_times[0] + $delta_times[1] + 0.01);
>            ## $log->notice("throttle: $host got $cpu_hundred/100 in this slot"); # 
>DEBUG
>            open $fh, ">>$historyfile" or return DECLINED;
>            my $time = time;
>            syswrite $fh, pack "LL", $time, $cpu_hundred;
>            close $fh;
> 
>            my $startwindow = $time - $WINDOW;
> 
>            if (my @stat = stat($blockfile)) {
>              if ($stat[9] > $startwindow) {
>                ## $log->notice("throttle: $blockfile is already blocking"); # DEBUG
>                return OK;           # nothing further to see... move along
>              } else {
>                ## $log->notice("throttle: $blockfile is old, ignoring"); # DEBUG
>              }
>            }
> 
>            # figure out if we should be blocking
>            my $totalcpu = 0;        # scaled by 100
> 
>            open $fh, $historyfile or return DECLINED;
>            while ((read $fh, my $buf, 8) > 0) {
>              my ($time, $cpu) = unpack "LL", $buf;
>              next if $time < $startwindow;
>              $totalcpu += $cpu;
>            }
>            close $fh;
> 
>            if ($totalcpu < $WINDOW * $DECLINE_CPU_PERCENT) {
>              ## $log->notice("throttle: $host got $totalcpu/100 CPU in $WINDOW 
>secs"); # DEBUG
>              unlink $blockfile;
>              return OK;
>            }
> 
>            ## about to be nasty... let's see how bad it is:
>            open $fh, "/proc/loadavg";
>            chomp(my $loadavg = <$fh>);
>            close $fh;
> 
>            my $useragent = $r->header_in('User-Agent') || "unknown";
> 
>            $log->notice("throttle: $host got $totalcpu/100 CPU in $WINDOW secs, 
>enabling block [loadavg $loadavg, agent $useragent]");
>            open $fh, ">$blockfile";
>            close $fh;
> 
>            return OK;
>          });
> 
>       ## back in the access handler:
> 
>       if (my @stat = stat($blockfile)) {
>         if ($stat[9] > time - $WINDOW) {
>           $log->warn("throttle access: $blockfile is blocking");
>           $r->header_out("Retry-After", $WINDOW);
>           return 503;               # Service Unavailable
>         } else {
>           ## $log->notice("throttle access: $blockfile is old, ignoring"); # DEBUG
>           return DECLINED;
>         }
>       }
> 
>       return DECLINED;
>     }
>     1;
> 
> --
> Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
> <[EMAIL PROTECTED]> <URL:http://www.stonehenge.com/merlyn/>
> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
> See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Reply via email to