>>>>> "Jay" == Jay J <[EMAIL PROTECTED]> writes:

Jay> I just tried it using IE5 for NT4 ..

Jay> What you're seeing is when someone has used "Make available
Jay> offline" followed by:

Jay> "If this favorite links to other pages, would you like to make
Jay> those pages available offline too? [y/n] ... Download pages [x]
Jay> links deep from this page"

Jay> The useragent is this: Mozilla/4.0 (compatible; MSIE 5.0; Windows
Jay> NT; DigExt)

Jay> And proceeds to crawl the site with 0-wait time between requests....

Jay> I haven't inspected the client-header to see if there might be
Jay> something to indicate it's in "crawl" mode .. I think it's
Jay> doubtful there is. So.....


Nope, I could find nothing to distinguish "evil spider" mode from
normal browsing mode, other than the rapidity of the download
requests.

So, I wrote my own throttling routines, unsatisfied with the others
that I found...

    package Stonehenge::Throttle;
    use strict;

    ## usage: PerlAccessHandler Stonehenge::Throttle;

    my $HISTORYDIR = "/home/merlyn/lib/Apache/Throttle";

    my $WINDOW = 90;            # seconds of interest
    my $SLOWBYTES = $WINDOW * 2000;     # bytes before we sleep
    my $SLEEP = 1;                      # sleep time
    my $DECLINEBYTES = $WINDOW * 3000; # bytes before we 408 error

    use vars qw($VERSION);
    $VERSION = (qw$Revision: 1.4 $ )[-1];

    use Apache::Constants qw(OK DECLINED);
    use Apache::File;
    use Apache::Log;

    use Stonehenge::Reload;

    sub handler {
      goto &handler if Stonehenge::Reload->reload_me;

      my $r = shift;
      return DECLINED unless $r->is_initial_req;
      my $log = $r->server->log;

      my $host = $r->get_remote_host;
      return DECLINED if $host =~ /\.(holdit|stonehenge)\.com$/;

      my $historyfile = "$HISTORYDIR/$host"; # closure var

      $r->register_cleanup
        (sub {
           my $fh = Apache::File->new;
           open $fh, ">>$historyfile" or return DECLINED;

           my $time = time;
           my $bytes = $r->bytes_sent;
           syswrite $fh, pack "LL", $time, $bytes;
           close $fh;

           return OK;
         });

      {
        my $startwindow = time - $WINDOW;
        my $totalbytes = 0;
        my $fh = Apache::File->new;
        open $fh, $historyfile or return DECLINED;
        while ((read $fh, my $buf, 8) > 0) {
          my ($time, $bytes) = unpack "LL", $buf;
          next if $time < $startwindow;
          $totalbytes += $bytes;
        }
        if ($totalbytes > $DECLINEBYTES) {
          $log->notice("$host got $totalbytes in $WINDOW secs, sending 503");
          $r->header_out("Retry-After", $WINDOW);
          return 503;           # Service Unavailable
        } elsif ($totalbytes > $SLOWBYTES) {
          $log->notice("$host got $totalbytes in $WINDOW secs, sleeping for $SLEEP");
          sleep $SLEEP;
          return DECLINED;
        } else {
          ## $log->notice("$host got $totalbytes in $WINDOW secs"); # DEBUG
          return DECLINED;
        }
      }
      return DECLINED;
    }
    1;

This has to be aided by a cron script run every 20 minutes or so
that looks like this:

    #!/usr/bin/perl -w
    use strict;

    # $Id: throttle-cleaner,v 1.1 1999/10/28 19:44:09 merlyn Exp $

    my $DIR = "/home/merlyn/lib/Apache/Throttle";
    my $SECS = 360;                     # more than Stonehenge::Throttle $WINDOW

    chdir $DIR or die "Cannot chdir $DIR: $!";
    opendir DOT, "." or die "Cannot opendir .: $!";
    my $when = time - $SECS;
    while (my $name = readdir DOT) {
      next unless -f $name;
      next if (stat($name))[8] > $when;
      ## warn "unlinking $name\n";
      unlink $name;
    }

So now I have a bytes-served-in-window throttler on my website that
prevents anyone from sucking down more than 3k/sec sustained over 90
seconds from any specific IP.

It triggered five times overnight.  But my ISP neighbors are now
happy.

I should clean up Stonehenge::Throttle and submit it.  Notice, no file
locking!  That was an interesting fallout of the design.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[EMAIL PROTECTED]> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Reply via email to