hi

i have a problem while trying to build a spider using perl threads.
Consider the program below which is just an example to get going.
i wish to hit a certain site's frontpage for any number of times (for
example 300)
i imagine that since theres a lot of content on the page each request
will take some time to process, and therefore i imagine it would be nice
to delegate the tasks using threads.
my problem is in this very naive example that the unthreaded version is
much faster.

two questions:
1) is there something wrong with the threaded code ?
2) does anyone have a working example of a spider using threads ?

thanks
./allan

#############################################################


use strict; use LWP; use threads; use threads::shared; use LWP::RobotUA; use URI;

my $MAX = 300;
my %store : shared;
my $robot;
my $count;
my $thr;

my $start = time();

my $url ="http://somewhere.com";;
my $THREADS = 0;

init_robot();

# if we have an argument use the  unthreaded version
if ($ARGV[0]) {
    main_loop2();
} else {
    $THREADS = 1;
    main_loop();
}
print_hash();

my $end = time();
my $elapsed = $end - $start;
print "This took $elapsed seconds\n";

sub init_robot {
    $robot = LWP::RobotUA->new("myname", '[EMAIL PROTECTED]' );
    my $delay = 1/6000;
    $robot->delay($delay);
}

sub main_loop {
    while($count < $MAX) {
        $count++;
        $thr = threads->new(\&lwp);
        $thr->join;
    }
}

sub main_loop2 {
    while($count < $MAX) {
        $count++;
        lwp();
    }
}

sub lwp {
    my $response = $robot->get( $url );
    my $content = $response->content;
    lock(%store) if $THREADS;
    if ($content =~ m,<title>([^<>]+)</title>,i) {
        $store{$count} = $1;
    }
    return $count;
}

sub print_hash {
    foreach my $key (keys %store) {
        print "$key --> $store{$key}\n";
    }
}




Reply via email to