I'm able to easily saturate my 10mbit connx, but it takes a powerful computer, if your computer is not so powerful try to fetch with the -noParsing flag, it will offload the parsing processing untill later, even a quad pentium 3 xeon 700mhz with 4gb of ram can only saturate about 5mbit, I've used 3ghz xeon w hyperthreading and it can do 10mbit (barely) with parsing on, my new dual core opteron has about 10% cpu load with parsing on and my athlon 64 3500+ can also do it just fine. -J PS: if you have a slow(er) computer fetch without parsing you can use a faster computer to parse the data after the fetch is completed.
BTW: for those who do not know it takes about 10% upstream bandwidth to fetch webpages with 100 threads, so if you have a 10mbit connx but only 512kbit upload your max download is around 5-6mbit found this out with roadrunners gamer connx 10mbit in 512kbit out ----- Original Message ----- From: "Christophe Noel" <[EMAIL PROTECTED]> To: <[email protected]> Sent: Tuesday, August 02, 2005 6:09 AM Subject: Fetcher delays - benchmarks > Hello, > > Following to some discussions, developpers mails, ... I tried to get the > best performances (pages/second) for the following case : > > - 120 web servers to crawl > - 10 Mbits/s connexion > > I reached about 3 Mbits/s average fetching speed with following > parameters (unpolite mode) : > > - fetcher.server.delay = 1.0 > - fetcher.per.host = 20 > - threads = 800 > - http.timeout = 5000 > > I see that Nutch is very slow for the first minuts ... performances > increase with time : it is now at 2500 kb/s and was at 2000kb/s 5 > minutes ago. > > segment 20050802115311, 7200 pages, 446 errors, 231654440 bytes, 706020 ms > 050802 120623 148 status: 10.198011 pages/s, 2563.3838 kb/s, 32174.227 > bytes/page > > I read Doug Cutting mail about fetcher.max.delay, but i still don't > understand how i cannot reach 10 mbits/s speed with 120 different servers. > > Any tips to increase my performances please ? > > > Thank you very much. > > Christophe Noël > Cetic Grid Data Mining > > ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
