On Wed, Mar 10, 2004 at 08:44:56PM +0100, allan juul wrote:
> hi
> 
> i have a problem while trying to build a spider using perl threads.
> Consider the program below which is just an example to get going.
> i wish to hit a certain site's frontpage for any number of times (for
> example 300)
> i imagine that since theres a lot of content on the page each request
> will take some time to process, and therefore i imagine it would be nice
> to delegate the tasks using threads.
> my problem is in this very naive example that the unthreaded version is
> much faster.
> 
> two questions:
> 1) is there something wrong with the threaded code ?

Yes. You fire off one thread, wait for it to finish, then start another
thread. ie only one worker thread is active at a time.

You need to do something like:

     push @threads, threads->new(\&lwp) for 1..$MAX; # spawn MAX threads;
     $_->join for @threads; # reap all threads.

However, you probably don't want to do this anyway, because:

* hitting a website with 300 simultaneous queries may be considered
antisocial;

* Perl threads work by copying most of the image of the Perl interpreter;
spawning 300 threads may well cause you to run out of memory; also
starting a thread is a time-consuming process, and you shouldn't do so
unless the thread is going to be long-lived; ie if each thread will do
many requests then fine; if its just going to do a single requwst and
exit, then the startup time will dominate.

* unless you need to make use of the specific features of Perl threads
(such as shared variables), you'll be a lot better off (on UNIX systems at
least) using fork instead - this is lot more efficient.

Dave.

-- 
A power surge on the Bridge is rapidly and correctly diagnosed as a faulty
capacitor by the highly-trained and competent engineering staff.
    -- Things That Never Happen in "Star Trek" #9

Reply via email to