Re: Limiting CPU (was Re: embperl pages and braindead sucking robots)
On 14 Dec 1999, Randal L. Schwartz wrote: Sounds to me like they are precisely at odds with anyone doing the kind of blocking that I want to do. That seems like a weird policy, though. nmap, for example, helps people do dastardly things, but that doesn't mean nmap is a bad program; it's how you use it that makes you bad. Teleport Pro, by default, is setup to be a nice little web robot. Just because one user configures the program to be evil doesn't mean you should stop other people who are trying to play nice. And since you can change its User-Agent, it doesn't seem like that's going to be very effective, anyway. -- Michael Plump | [EMAIL PROTECTED] | email me about making $ selling snorks The ultimate Joe Dietz repository: http://www.skylab.org/~plumpy/dietz.txt
Re: Limiting CPU (was Re: embperl pages and braindead sucking robots)
"Michael" == Michael Plump [EMAIL PROTECTED] writes: Michael Teleport Pro, by default, is setup to be a nice little web Michael robot. Just because one user configures the program to be Michael evil doesn't mean you should stop other people who are trying Michael to play nice. And since you can change its User-Agent, it Michael doesn't seem like that's going to be very effective, anyway. Yes, it's possible to configure it so that it works correctly, but if I recall, I also saw it fetch /cgi/whatever, even though that was in /robots.txt. I *must* block anything that doesn't respect /robots.txt. Once they fix that, I might let it loose. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 [EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/ Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
Re: Limiting CPU (was Re: embperl pages and braindead sucking robots)
Randal Yes, it's possible to configure it so that it works correctly, Randal but if I recall, I also saw it fetch /cgi/whatever, even though Randal that was in /robots.txt. I *must* block anything that doesn't Randal respect /robots.txt. Once they fix that, I might let it loose. Teleport Pro obeys robots.txt by default, but unfortunately, it can be set to ignore robot exclusion rules. Michael Just because one user configures the program to be Michael evil doesn't mean you should stop other people who Michael are trying to play nice. Yes, it does. The actions of wrong dooers affect the innocent, here and everywhere. ELB -- Eric L. Brine | Chicken: The egg's way of making more eggs. [EMAIL PROTECTED] | Do you always hit the nail on the thumb? ICQ# 4629314 | An optimist thinks thorn bushes have roses.
Re: Limiting CPU (was Re: embperl pages and braindead sucking robots)
"Doug" == Doug MacEachern [EMAIL PROTECTED] writes: My CPU-based limiter is working quite nicely. It lets oodles of static pages be served, but if someone starts doing CPU intensive stuff, they get booted for hogging my server machine. The nice thing is that I return a standard "503" error including a "retry-after", so if it is a legitimate mirroring program, it'll know how to deal with the error. Doug choice! It's also been very successful at catching a whole slew of user-agents that believe in sucking senselessly. Here's my current block-list: or m{Offline Explorer/} # bad robot! or m{www\.gozilla\.com} # bad robot! or m{pavuk-}# bad robot! or m{ExtractorPro} # bad robot! or m{WebCopier} # bad robot! or m{MSIECrawler} # bad robot! or m{WebZIP}# bad robot! or m{Teleport Pro} # bad robot! or m{NetAttache/} # bad robot! or m{gazz/} # bad robot! or m{geckobot} # bad robot! or m{nttdirectory} # bad robot! or m{Mister PiX}# bad robot! Of course, these are just the ones that hit my site hard enough to trigger the "exceeds 10% cumulative CPU in 15 seconds" rule. They often get in trouble when they start invoking the 20 or 30 links in /books/ that look like /cgi/amazon?isbn=, in SPITE of my /robots.txt that says "don't look in /cgi". (More on that in a second...) Doug - one thing I noticed is that mod_cgi isn't charging the child-process time to the server anywhere between post-read-request and log phases. Does that mean there's no "wait" or "waitpid" until cleanup? Doug it should be, mod_cgi waits for the child, parsing it's header output, Doug etc. mod_cgi does no waiting. :) The only wait appears to be in the cleanup handling area. Hence, in my logger, I do this: ## first, reap any zombies so child CPU is proper: { my $kid = waitpid(-1, 1); if ($kid 0) { # $r-log-warn("found kid $kid"); # DEBUG redo; } } And every mod_cgi would generate a process for me in this LogHandler, so I know that mod_cgi is not reaping them. FYI. :) Also, Doug, can there be only one $r-cleanup_handler? I was getting intermittent results until I changed my -cleanup_handler into a push'ed loghandler. I also use -cleanup_handler in other modules, so I'm wondering if there's a conflict. Doug you should be able to use any number of cleanup_handlers. do you have a Doug small test case to reproduce the problem? Grr. Not really. I just moved everything to LogHandlers instead. I just no longer trust cleanup_handlers, because my tests were consistent with "only one cleanup permitted". -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 [EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/ Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
Re: Limiting CPU (was Re: embperl pages and braindead sucking robots)
It's also been very successful at catching a whole slew of user-agents that believe in sucking senselessly. Here's my current block-list: [...] or m{Teleport Pro} # bad robot! [...] Teleport Pro does have options to control how it behaves: 1. "Obey the Robot Exclusion Standard". Default: On 2. "Wait seconds before requesting more than two files at once from a single server". Valid: 0+ Default: 1 second 3. Number of threads. Valid: 1-10 Default: 10 Because of #2, Teleport Pro only has one active thread at a time, and it is idle at least 50% of the time (when downloading image archives). In other words, it's possible for a user can configure Teleport Pro to hammer a server, but it behaves respectfully using the default settings. Their site: http://www.tenmax.com/ ELB -- Eric L. Brine | Chicken: The egg's way of making more eggs. [EMAIL PROTECTED] | Do you always hit the nail on the thumb? ICQ# 4629314 | An optimist thinks thorn bushes have roses.
Re: Limiting CPU (was Re: embperl pages and braindead sucking robots)
"Eric" == Eric L Brine [EMAIL PROTECTED] writes: Eric Because of #2, Teleport Pro only has one active thread at a time, and it Eric is idle at least 50% of the time (when downloading image archives). In Eric other words, it's possible for a user can configure Teleport Pro to Eric hammer a server, but it behaves respectfully using the default settings. The users that hit mine, hit my limits. So "one bad apple does in fact spoil the whole bunch, girl." :) Eric Their site: http://www.tenmax.com/ But fear this, at http://www.tenmax.com/teleport/pro/features.htm: libTen simultaneous retrieval threads/b get data at the fastest speeds possible liServer-side image map exploration -- translates server-side maps into client-side maps for offline browsing liServer Overload Protection -- prevents remote servers from overloading and dropping connection early liConfigurable Agent Identity allows Teleport Pro to impersonate popular browsers; gets data from even the stingiest servers Sounds to me like they are precisely at odds with anyone doing the kind of blocking that I want to do. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 [EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/ Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
Re: Limiting CPU (was Re: embperl pages and braindead sucking robots)
My CPU-based limiter is working quite nicely. It lets oodles of static pages be served, but if someone starts doing CPU intensive stuff, they get booted for hogging my server machine. The nice thing is that I return a standard "503" error including a "retry-after", so if it is a legitimate mirroring program, it'll know how to deal with the error. choice! Doug - one thing I noticed is that mod_cgi isn't charging the child-process time to the server anywhere between post-read-request and log phases. Does that mean there's no "wait" or "waitpid" until cleanup? it should be, mod_cgi waits for the child, parsing it's header output, etc. Also, Doug, can there be only one $r-cleanup_handler? I was getting intermittent results until I changed my -cleanup_handler into a push'ed loghandler. I also use -cleanup_handler in other modules, so I'm wondering if there's a conflict. you should be able to use any number of cleanup_handlers. do you have a small test case to reproduce the problem?
Re: Limiting CPU (was Re: embperl pages and braindead sucking robots)
"Barry" == Barry Robison [EMAIL PROTECTED] writes: Barry On Wed, Nov 24, 1999 at 07:31:36AM -0800, Randal L. Schwartz wrote: I also added a DBILogger that logs CPU times, so I can see which pages on my system are burning the most CPU, and even tell which hosts suck down the most CPU in a day. mod_perl rules! Barry Would you be willing to share that? Sounds handy! OK, here it is so far, although it's a work in progress. I derived it mostly from the code in the modperl book. By the way, logging $r-uri does NOT show as much info as logging the middle part of $r-the_request, and I couldn't see any easy way to do it except how I've done it here. The fields "wall, cpuuser, cpusys, cpucuser, cpucsys" have the delta outputs from "time" and "times", so I can even see wall-clock for each request from start to finish as well as CPU, and I also *should* be able to see mod_cgi's child usage, but I can't (see other message...). package Stonehenge::DBILog; use strict; ## usage: PerlInitHandler Stonehenge::DBILog use vars qw($VERSION); $VERSION = (qw$Revision: 1.4 $ )[-1]; use Apache::Constants qw(OK DECLINED); use DBI (); use Apache::Util qw(ht_time); use Apache::Log;# DEBUG my $DSN = 'dbi:mysql:merlyn_httpd'; my $DB_TABLE = 'requests'; my $DB_AUTH = 'YourUser:YourPassword'; # :-) my @FIELDS = qw(when host method url user referer browser status bytes wall cpuuser cpusys cpucuser cpucsys); my $INSERT = "INSERT INTO $DB_TABLE (". (join ",", @FIELDS). ") VALUES(". (join ",", ("?") x @FIELDS). ")"; =for SQL create table requests ( when datetime not null, host varchar(255) not null, method varchar(8) not null, url varchar(255) not null, user varchar(50), referer varchar(255), browser varchar(255), status smallint(3) default 0, bytes int(8), wall smallint(5), cpuuser float(8), cpusys float(8), cpucuser float(8), cpucsys float(8) ); =cut sub handler { use Stonehenge::Reload; goto handler if Stonehenge::Reload-reload_me; my $r = shift; return DECLINED unless $r-is_initial_req; my @times = (time, times);# closure $r-push_handlers (PerlLogHandler = sub { ## delta these times: @times = map { $_ - shift @times } time, times; my $orig = shift; my $r = $orig-last; my @data = ( ht_time($orig-request_time, '%Y-%m-%d %H:%M:%S', 0), $r-get_remote_host, $r-method, # $orig-uri, ($r-the_request =~ /^\S+\s+(\S+)/)[0], $r-connection-user, $r-header_in('Referer'), $r-header_in('User-agent'), $orig-status, $r-bytes_sent, @times, ); ## $r-log-warn(map "[$_]", @data); # DEBUG eval { my $dbh = DBI-connect($DSN, (split ':', $DB_AUTH), { RaiseError = 1 }); my $sth = $dbh-prepare($INSERT); $sth-execute(@data); $sth-finish; $dbh-disconnect; }; if ($@) { $r-log-error("dbi: $@"); } return DECLINED; }); return DECLINED; } 1; -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 [EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/ Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!