perl process management

Lance Wed, 04 Jun 2003 20:24:57 -0700

Hi,

I am writing a Perl script ( on a Pentium1 RedHat box ) to monitor some
websites on some webservers.  There are about 20 servers with 5 sites each.
I have been playing with running various parts of the script in parallel, to
try to get a perfomance boost, but am at a quandry.  If I run the check on
each one sequentially, the entire script takes about 4 minutes to execute
(the fastest I have been able to achieve).  However, if there is a problem
with one of the pages, the user-agent will wait ( I have it timed to wait a
max of 30 secs ).  If many pages have difficulty, this will extend the
execution time of the script dramatically.


The data structure is arranged like so:
Server1 -> Store1 -> Page1 ->page-specific info
                             -> Page2
             -> Store2 -> Page4
             -> Store3 -> Page8
Server2 -> Store9 -> Page3
             -> Store5 -> Page1
etc..

Please ignore the numbers - Page1 in Store1 of Server1 is NOT the same page
as Page1 in Store5 of Server2.

The script iterates through the Hash of hashes of hashes of hashes.  Once it
gets to the page level, it runs another script to validate the webpage.  It
is this validation and the work of invoking another Perl instance that takes
the most time.  If I skip the check, the script will execute in it's
entirety in about 15 seconds!

I used to have the webpage-validating external script as an internal
subroutine, but having to wait for the sub to finish defeats the ability to
run the page checks in parallel.  However.... when I do execute the checks
in parallel at the "Page" level (ie one new Perl instance is created for
each page to be checked) the initial script (imaginitively called
'store_monitor.pm') executes and completes in about 20 seconds.  But then
there are about 80 Perl instances in the process list - all of them
executing their version of the webpage validation script, 'page_check.pl'.
These 80 Perl instances take up alotta resources!  whew!  The total
execution time goes from about 4 minutes to over 40!  NOT exactly the speed
increase I was looking for.

So.  I have come to the conclusion that each Perl instance requires overhead
to use.  I *knew* this of course, I just did not expect it to create such a
logjam.  The script executes fastest with just one Perl instance - just one
does use >90% cpu for the entire time.  Extras just seem to split 100% among
them.

Is there a way to get the initial Perl instance to run in parallel?  The big
thing is to not have to wait for pages that don't respond quickly.

I could take two runs at each page, a test run that only waits 10 seconds
for a valid page, and if that does not work, spawn another process that uses
the full waiting time.  but this does run the risk of submarining the
'parent' process if too many pages aren't working...


fyi - I am currently executing page_check.pl as a linux background process
(when I was testing the parallel run) .  I tried using fork() but it seemed
to have extra overhead of keeping the values of the 'parent' process, when I
just wanted to run another script.  So I use system( "perl page_check.pl
xmlfile.xml \&" ) to spawn a bg process, or without the '\&' to wait for the
script to return.

fyi2 - I use an xml file for interprocess communication.




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

perl process management

Reply via email to