>>>>> "Doug" == Doug MacEachern <[EMAIL PROTECTED]> writes:

>> My CPU-based limiter is working quite nicely.  It lets oodles of
>> static pages be served, but if someone starts doing CPU intensive
>> stuff, they get booted for hogging my server machine.  The nice thing
>> is that I return a standard "503" error including a "retry-after", so
>> if it is a legitimate mirroring program, it'll know how to deal with
>> the error.

Doug> choice!

It's also been very successful at catching a whole slew of user-agents
that believe in sucking senselessly.  Here's my current block-list:

            or m{Offline Explorer/} # bad robot!
            or m{www\.gozilla\.com} # bad robot!
            or m{pavuk-}        # bad robot!
            or m{ExtractorPro}  # bad robot!
            or m{WebCopier}     # bad robot!
            or m{MSIECrawler}   # bad robot!
            or m{WebZIP}        # bad robot!
            or m{Teleport Pro}  # bad robot!
            or m{NetAttache/}   # bad robot!
            or m{gazz/}         # bad robot!
            or m{geckobot}      # bad robot!
            or m{nttdirectory}  # bad robot!
            or m{Mister PiX}    # bad robot!

Of course, these are just the ones that hit my site hard enough to trigger
the "exceeds 10% cumulative CPU in 15 seconds" rule.  They often get
in trouble when they start invoking the 20 or 30 links in /books/
that look like /cgi/amazon?isbn=...., in SPITE of my /robots.txt that
says "don't look in /cgi".  (More on that in a second...)

>> Doug - one thing I noticed is that mod_cgi isn't charging the
>> child-process time to the server anywhere between post-read-request
>> and log phases.  Does that mean there's no "wait" or "waitpid" until
>> cleanup?

Doug> it should be, mod_cgi waits for the child, parsing it's header output,
Doug> etc.

mod_cgi does no waiting. :)  The only wait appears to be in the cleanup
handling area.  Hence, in my logger, I do this:

       ## first, reap any zombies so child CPU is proper:
       {
         my $kid = waitpid(-1, 1);
         if ($kid > 0) {
           # $r->log->warn("found kid $kid"); # DEBUG
           redo;
         }
       }

And every mod_cgi would generate a process for me in this LogHandler,
so I know that mod_cgi is not reaping them.  FYI. :)
 
>> Also, Doug, can there be only one $r->cleanup_handler?  I was getting
>> intermittent results until I changed my ->cleanup_handler into a
>> push'ed loghandler.  I also use ->cleanup_handler in other modules, so
>> I'm wondering if there's a conflict.

Doug> you should be able to use any number of cleanup_handlers.  do you have a
Doug> small test case to reproduce the problem?

Grr.  Not really.  I just moved everything to LogHandlers instead.  I
just no longer trust cleanup_handlers, because my tests were
consistent with "only one cleanup permitted".

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[EMAIL PROTECTED]> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Reply via email to