>>>>> "ricarDo" == ricarDo oliveiRa <[EMAIL PROTECTED]> writes:

ricarDo> thanks in advance for any further help.  I have a web crawler
ricarDo> running in a solaris+apache+mod_perl web server, and for some
ricarDo> reason, when I try go get the contents of a certain page, it
ricarDo> hangs and gives no timeout whatsoever.

First, are you *really* sure you want to write Yet Another WebCrawler?

Are you following the Robot Exclusion Protocol?

Did you read my five or six columns that talk about spiders and LWP
in <http://www.stonehenge.com/merlyn/WebTechniques/>?

ricarDo> use LWP::Simple;

ricarDo> ...

ricarDo> $page_text = get($thisURL);

ricarDo> ...

What happens when you visit that URL with a browser?  With "GET" from
the command line?

ricarDo> besides this problem, the crawler is monolithic. can I do a
ricarDo> fork to speed things up? Any suggestions?

If you're asking questions like this, you need to first study the
existing art before innovating.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[EMAIL PROTECTED]> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Reply via email to