Hi, I am writing a Perl script ( on a Pentium1 RedHat box ) to monitor some websites on some webservers. There are about 20 servers with 5 sites each. I have been playing with running various parts of the script in parallel, to try to get a perfomance boost, but am at a quandry. If I run the check on each one sequentially, the entire script takes about 4 minutes to execute (the fastest I have been able to achieve). However, if there is a problem with one of the pages, the user-agent will wait ( I have it timed to wait a max of 30 secs ). If many pages have difficulty, this will extend the execution time of the script dramatically.
The data structure is arranged like so: Server1 -> Store1 -> Page1 ->page-specific info -> Page2 -> Store2 -> Page4 -> Store3 -> Page8 Server2 -> Store9 -> Page3 -> Store5 -> Page1 etc.. Please ignore the numbers - Page1 in Store1 of Server1 is NOT the same page as Page1 in Store5 of Server2. The script iterates through the Hash of hashes of hashes of hashes. Once it gets to the page level, it runs another script to validate the webpage. It is this validation and the work of invoking another Perl instance that takes the most time. If I skip the check, the script will execute in it's entirety in about 15 seconds! I used to have the webpage-validating external script as an internal subroutine, but having to wait for the sub to finish defeats the ability to run the page checks in parallel. However.... when I do execute the checks in parallel at the "Page" level (ie one new Perl instance is created for each page to be checked) the initial script (imaginitively called 'store_monitor.pm') executes and completes in about 20 seconds. But then there are about 80 Perl instances in the process list - all of them executing their version of the webpage validation script, 'page_check.pl'. These 80 Perl instances take up alotta resources! whew! The total execution time goes from about 4 minutes to over 40! NOT exactly the speed increase I was looking for. So. I have come to the conclusion that each Perl instance requires overhead to use. I *knew* this of course, I just did not expect it to create such a logjam. The script executes fastest with just one Perl instance - just one does use >90% cpu for the entire time. Extras just seem to split 100% among them. Is there a way to get the initial Perl instance to run in parallel? The big thing is to not have to wait for pages that don't respond quickly. I could take two runs at each page, a test run that only waits 10 seconds for a valid page, and if that does not work, spawn another process that uses the full waiting time. but this does run the risk of submarining the 'parent' process if too many pages aren't working... fyi - I am currently executing page_check.pl as a linux background process (when I was testing the parallel run) . I tried using fork() but it seemed to have extra overhead of keeping the values of the 'parent' process, when I just wanted to run another script. So I use system( "perl page_check.pl xmlfile.xml \&" ) to spawn a bg process, or without the '\&' to wait for the script to return. fyi2 - I use an xml file for interprocess communication. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]