>>>>> "Jay" == Jay J <[EMAIL PROTECTED]> writes:
Jay> I just tried it using IE5 for NT4 ..
Jay> What you're seeing is when someone has used "Make available
Jay> offline" followed by:
Jay> "If this favorite links to other pages, would you like to make
Jay> those pages available offline too? [y/n] ... Download pages [x]
Jay> links deep from this page"
Jay> The useragent is this: Mozilla/4.0 (compatible; MSIE 5.0; Windows
Jay> NT; DigExt)
Jay> And proceeds to crawl the site with 0-wait time between requests....
Jay> I haven't inspected the client-header to see if there might be
Jay> something to indicate it's in "crawl" mode .. I think it's
Jay> doubtful there is. So.....
Nope, I could find nothing to distinguish "evil spider" mode from
normal browsing mode, other than the rapidity of the download
requests.
So, I wrote my own throttling routines, unsatisfied with the others
that I found...
package Stonehenge::Throttle;
use strict;
## usage: PerlAccessHandler Stonehenge::Throttle;
my $HISTORYDIR = "/home/merlyn/lib/Apache/Throttle";
my $WINDOW = 90; # seconds of interest
my $SLOWBYTES = $WINDOW * 2000; # bytes before we sleep
my $SLEEP = 1; # sleep time
my $DECLINEBYTES = $WINDOW * 3000; # bytes before we 408 error
use vars qw($VERSION);
$VERSION = (qw$Revision: 1.4 $ )[-1];
use Apache::Constants qw(OK DECLINED);
use Apache::File;
use Apache::Log;
use Stonehenge::Reload;
sub handler {
goto &handler if Stonehenge::Reload->reload_me;
my $r = shift;
return DECLINED unless $r->is_initial_req;
my $log = $r->server->log;
my $host = $r->get_remote_host;
return DECLINED if $host =~ /\.(holdit|stonehenge)\.com$/;
my $historyfile = "$HISTORYDIR/$host"; # closure var
$r->register_cleanup
(sub {
my $fh = Apache::File->new;
open $fh, ">>$historyfile" or return DECLINED;
my $time = time;
my $bytes = $r->bytes_sent;
syswrite $fh, pack "LL", $time, $bytes;
close $fh;
return OK;
});
{
my $startwindow = time - $WINDOW;
my $totalbytes = 0;
my $fh = Apache::File->new;
open $fh, $historyfile or return DECLINED;
while ((read $fh, my $buf, 8) > 0) {
my ($time, $bytes) = unpack "LL", $buf;
next if $time < $startwindow;
$totalbytes += $bytes;
}
if ($totalbytes > $DECLINEBYTES) {
$log->notice("$host got $totalbytes in $WINDOW secs, sending 503");
$r->header_out("Retry-After", $WINDOW);
return 503; # Service Unavailable
} elsif ($totalbytes > $SLOWBYTES) {
$log->notice("$host got $totalbytes in $WINDOW secs, sleeping for $SLEEP");
sleep $SLEEP;
return DECLINED;
} else {
## $log->notice("$host got $totalbytes in $WINDOW secs"); # DEBUG
return DECLINED;
}
}
return DECLINED;
}
1;
This has to be aided by a cron script run every 20 minutes or so
that looks like this:
#!/usr/bin/perl -w
use strict;
# $Id: throttle-cleaner,v 1.1 1999/10/28 19:44:09 merlyn Exp $
my $DIR = "/home/merlyn/lib/Apache/Throttle";
my $SECS = 360; # more than Stonehenge::Throttle $WINDOW
chdir $DIR or die "Cannot chdir $DIR: $!";
opendir DOT, "." or die "Cannot opendir .: $!";
my $when = time - $SECS;
while (my $name = readdir DOT) {
next unless -f $name;
next if (stat($name))[8] > $when;
## warn "unlinking $name\n";
unlink $name;
}
So now I have a bytes-served-in-window throttler on my website that
prevents anyone from sucking down more than 3k/sec sustained over 90
seconds from any specific IP.
It triggered five times overnight. But my ISP neighbors are now
happy.
I should clean up Stonehenge::Throttle and submit it. Notice, no file
locking! That was an interesting fallout of the design.
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[EMAIL PROTECTED]> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!