Re: Limiting CPU (was Re: embperl pages and braindead sucking robots)

1999-12-16 Thread Michael Plump

On 14 Dec 1999, Randal L. Schwartz wrote:

 Sounds to me like they are precisely at odds with anyone doing the
 kind of blocking that I want to do.

That seems like a weird policy, though.  nmap, for example, helps people
do dastardly things, but that doesn't mean nmap is a bad program; it's how
you use it that makes you bad.

Teleport Pro, by default, is setup to be a nice little web robot.  Just
because one user configures the program to be evil doesn't mean you should
stop other people who are trying to play nice.  And since you can change
its User-Agent, it doesn't seem like that's going to be very effective,
anyway.

--
Michael Plump | [EMAIL PROTECTED] | email me about making $ selling snorks
The ultimate Joe Dietz repository: http://www.skylab.org/~plumpy/dietz.txt



Re: Limiting CPU (was Re: embperl pages and braindead sucking robots)

1999-12-16 Thread Randal L. Schwartz

 "Michael" == Michael Plump [EMAIL PROTECTED] writes:

Michael Teleport Pro, by default, is setup to be a nice little web
Michael robot.  Just because one user configures the program to be
Michael evil doesn't mean you should stop other people who are trying
Michael to play nice.  And since you can change its User-Agent, it
Michael doesn't seem like that's going to be very effective, anyway.

Yes, it's possible to configure it so that it works correctly, but if
I recall, I also saw it fetch /cgi/whatever, even though that was in
/robots.txt.  I *must* block anything that doesn't respect
/robots.txt.  Once they fix that, I might let it loose.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
[EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



Re: Limiting CPU (was Re: embperl pages and braindead sucking robots)

1999-12-16 Thread Eric L. Brine


Randal Yes, it's possible to configure it so that it works correctly, 
Randal but if I recall, I also saw it fetch /cgi/whatever, even though 
Randal that was in /robots.txt.  I *must* block anything that doesn't 
Randal respect /robots.txt.  Once they fix that, I might let it loose.

Teleport Pro obeys robots.txt by default, but unfortunately, it can be
set to ignore robot exclusion rules. 

Michael Just because one user configures the program to be
Michael evil doesn't mean you should stop other people who
Michael are trying to play nice.

Yes, it does. The actions of wrong dooers affect the innocent, here and
everywhere.

ELB

--
Eric L. Brine  |  Chicken: The egg's way of making more eggs.
[EMAIL PROTECTED]  |  Do you always hit the nail on the thumb?
ICQ# 4629314   |  An optimist thinks thorn bushes have roses.



Re: Limiting CPU (was Re: embperl pages and braindead sucking robots)

1999-12-14 Thread Randal L. Schwartz

 "Doug" == Doug MacEachern [EMAIL PROTECTED] writes:

 My CPU-based limiter is working quite nicely.  It lets oodles of
 static pages be served, but if someone starts doing CPU intensive
 stuff, they get booted for hogging my server machine.  The nice thing
 is that I return a standard "503" error including a "retry-after", so
 if it is a legitimate mirroring program, it'll know how to deal with
 the error.

Doug choice!

It's also been very successful at catching a whole slew of user-agents
that believe in sucking senselessly.  Here's my current block-list:

or m{Offline Explorer/} # bad robot!
or m{www\.gozilla\.com} # bad robot!
or m{pavuk-}# bad robot!
or m{ExtractorPro}  # bad robot!
or m{WebCopier} # bad robot!
or m{MSIECrawler}   # bad robot!
or m{WebZIP}# bad robot!
or m{Teleport Pro}  # bad robot!
or m{NetAttache/}   # bad robot!
or m{gazz/} # bad robot!
or m{geckobot}  # bad robot!
or m{nttdirectory}  # bad robot!
or m{Mister PiX}# bad robot!

Of course, these are just the ones that hit my site hard enough to trigger
the "exceeds 10% cumulative CPU in 15 seconds" rule.  They often get
in trouble when they start invoking the 20 or 30 links in /books/
that look like /cgi/amazon?isbn=, in SPITE of my /robots.txt that
says "don't look in /cgi".  (More on that in a second...)

 Doug - one thing I noticed is that mod_cgi isn't charging the
 child-process time to the server anywhere between post-read-request
 and log phases.  Does that mean there's no "wait" or "waitpid" until
 cleanup?

Doug it should be, mod_cgi waits for the child, parsing it's header output,
Doug etc.

mod_cgi does no waiting. :)  The only wait appears to be in the cleanup
handling area.  Hence, in my logger, I do this:

   ## first, reap any zombies so child CPU is proper:
   {
 my $kid = waitpid(-1, 1);
 if ($kid  0) {
   # $r-log-warn("found kid $kid"); # DEBUG
   redo;
 }
   }

And every mod_cgi would generate a process for me in this LogHandler,
so I know that mod_cgi is not reaping them.  FYI. :)
 
 Also, Doug, can there be only one $r-cleanup_handler?  I was getting
 intermittent results until I changed my -cleanup_handler into a
 push'ed loghandler.  I also use -cleanup_handler in other modules, so
 I'm wondering if there's a conflict.

Doug you should be able to use any number of cleanup_handlers.  do you have a
Doug small test case to reproduce the problem?

Grr.  Not really.  I just moved everything to LogHandlers instead.  I
just no longer trust cleanup_handlers, because my tests were
consistent with "only one cleanup permitted".

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
[EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



Re: Limiting CPU (was Re: embperl pages and braindead sucking robots)

1999-12-14 Thread Eric L. Brine


 It's also been very successful at catching a whole slew of user-agents
 that believe in sucking senselessly.  Here's my current block-list:
 
 [...]
 or m{Teleport Pro}  # bad robot!
 [...]

Teleport Pro does have options to control how it behaves:

1. "Obey the Robot Exclusion Standard". Default: On

2. "Wait  seconds before requesting more than two files at once from
a single server".  Valid: 0+  Default: 1 second

3. Number of threads. Valid: 1-10  Default: 10

Because of #2, Teleport Pro only has one active thread at a time, and it
is idle at least 50% of the time (when downloading image archives). In
other words, it's possible for a user can configure Teleport Pro to
hammer a server, but it behaves respectfully using the default settings.

Their site: http://www.tenmax.com/

ELB

--
Eric L. Brine  |  Chicken: The egg's way of making more eggs.
[EMAIL PROTECTED]  |  Do you always hit the nail on the thumb?
ICQ# 4629314   |  An optimist thinks thorn bushes have roses.



Re: Limiting CPU (was Re: embperl pages and braindead sucking robots)

1999-12-14 Thread Randal L. Schwartz

 "Eric" == Eric L Brine [EMAIL PROTECTED] writes:

Eric Because of #2, Teleport Pro only has one active thread at a time, and it
Eric is idle at least 50% of the time (when downloading image archives). In
Eric other words, it's possible for a user can configure Teleport Pro to
Eric hammer a server, but it behaves respectfully using the default settings.

The users that hit mine, hit my limits.  So "one bad apple does in fact
spoil the whole bunch, girl." :)

Eric Their site: http://www.tenmax.com/

But fear this, at http://www.tenmax.com/teleport/pro/features.htm:

libTen simultaneous retrieval threads/b get data at the
fastest speeds possible

liServer-side image map exploration -- translates server-side
maps into client-side maps for offline browsing

liServer Overload Protection -- prevents remote servers from
overloading and dropping connection early

liConfigurable Agent Identity allows Teleport Pro to impersonate
popular browsers; gets data from even the stingiest servers

Sounds to me like they are precisely at odds with anyone doing the
kind of blocking that I want to do.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
[EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



Re: Limiting CPU (was Re: embperl pages and braindead sucking robots)

1999-12-13 Thread Doug MacEachern

 My CPU-based limiter is working quite nicely.  It lets oodles of
 static pages be served, but if someone starts doing CPU intensive
 stuff, they get booted for hogging my server machine.  The nice thing
 is that I return a standard "503" error including a "retry-after", so
 if it is a legitimate mirroring program, it'll know how to deal with
 the error.

choice!
 
 Doug - one thing I noticed is that mod_cgi isn't charging the
 child-process time to the server anywhere between post-read-request
 and log phases.  Does that mean there's no "wait" or "waitpid" until
 cleanup?

it should be, mod_cgi waits for the child, parsing it's header output,
etc.
 
 Also, Doug, can there be only one $r-cleanup_handler?  I was getting
 intermittent results until I changed my -cleanup_handler into a
 push'ed loghandler.  I also use -cleanup_handler in other modules, so
 I'm wondering if there's a conflict.

you should be able to use any number of cleanup_handlers.  do you have a
small test case to reproduce the problem?



Re: Limiting CPU (was Re: embperl pages and braindead sucking robots)

1999-11-24 Thread Randal L. Schwartz

 "Barry" == Barry Robison [EMAIL PROTECTED] writes:

Barry On Wed, Nov 24, 1999 at 07:31:36AM -0800, Randal L. Schwartz wrote:
 
 I also added a DBILogger that logs CPU times, so I can see which pages
 on my system are burning the most CPU, and even tell which hosts suck
 down the most CPU in a day.  mod_perl rules!
 

Barry Would you be willing to share that? Sounds handy!

OK, here it is so far, although it's a work in progress.  I derived it
mostly from the code in the modperl book.  By the way, logging $r-uri
does NOT show as much info as logging the middle part of
$r-the_request, and I couldn't see any easy way to do it except how
I've done it here.  The fields "wall, cpuuser, cpusys, cpucuser,
cpucsys" have the delta outputs from "time" and "times", so I can even
see wall-clock for each request from start to finish as well as CPU,
and I also *should* be able to see mod_cgi's child usage, but I can't
(see other message...).

package Stonehenge::DBILog;
use strict;

## usage: PerlInitHandler Stonehenge::DBILog

use vars qw($VERSION);
$VERSION = (qw$Revision: 1.4 $ )[-1];

use Apache::Constants qw(OK DECLINED);
use DBI ();
use Apache::Util qw(ht_time);
use Apache::Log;# DEBUG

my $DSN = 'dbi:mysql:merlyn_httpd';
my $DB_TABLE = 'requests';
my $DB_AUTH = 'YourUser:YourPassword'; # :-)

my @FIELDS =
  qw(when host method url user referer browser status bytes
 wall cpuuser cpusys cpucuser cpucsys);
my $INSERT =
  "INSERT INTO $DB_TABLE (".
  (join ",", @FIELDS).
  ") VALUES(".
  (join ",", ("?") x @FIELDS).
  ")";

=for SQL

create table requests (
  when datetime not null,
  host varchar(255) not null,
  method varchar(8) not null,
  url varchar(255) not null,
  user varchar(50),
  referer varchar(255),
  browser varchar(255),
  status smallint(3) default 0,
  bytes int(8),
  wall smallint(5),
  cpuuser float(8),
  cpusys float(8),
  cpucuser float(8),
  cpucsys float(8)
);

=cut

sub handler {
  use Stonehenge::Reload; goto handler if Stonehenge::Reload-reload_me;

  my $r = shift;
  return DECLINED unless $r-is_initial_req;

  my @times = (time, times);# closure

  $r-push_handlers
(PerlLogHandler =
 sub {
   ## delta these times:
   @times = map { $_ - shift @times } time, times;

   my $orig = shift;
   my $r = $orig-last;

   my @data =
 (
  ht_time($orig-request_time, '%Y-%m-%d %H:%M:%S', 0),
  $r-get_remote_host,
  $r-method,
  # $orig-uri,
  ($r-the_request =~ /^\S+\s+(\S+)/)[0],
  $r-connection-user,
  $r-header_in('Referer'),
  $r-header_in('User-agent'),
  $orig-status,
  $r-bytes_sent,
  @times,
 );

   ## $r-log-warn(map "[$_]", @data); # DEBUG

   eval {
 my $dbh = DBI-connect($DSN, (split ':', $DB_AUTH),
   { RaiseError = 1 });
 my $sth = $dbh-prepare($INSERT);
 $sth-execute(@data);
 $sth-finish;
 $dbh-disconnect;
   };
   if ($@) {
 $r-log-error("dbi: $@");
   }

   return DECLINED;
 });

  return DECLINED;
}

1;



-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
[EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!