Find out how many sites your system and network will let you crawl at
once. Limit the number of parallel jobs to that. Read about tuning
POE::Component::Client::HTTP, either in the documentation or in this
mailing list's archives.
Stay under your system's limits. Consider how performance plummets
when a machine overcommits its memory and begins swapping. Don't let
that happen to you.
Use fork() with POE to take advantage of both cores.
Are you looking for a design consultant?
--
Rocco Caputo - rcap...@pobox.com
On Nov 20, 2009, at 23:15, Ryan Chan wrote:
Hello,
Assume I only have a dual core server, with limited 1GB memory , I
want to build a web robot to crawl 1000 pre-defined web sites.
Anyone can provide a basic strategy for my tasks?
Should I create 1000 sessions at the same time, to archive the max
network throughput?
Thanks.