Fuad, I think you are constantly comparing apples and oranges here. It looks like your new code simply hammers the server sending multiple requests to a single server in parallel. That's a big no-no in a web crawling/spidering/fetching world, as bad as not obeying robots.txt.
The fact that the speed difference is SO large is a clear hint that the comparison may not be right, and that the 3 plugins you are comparing are configured very differently. Otis --- Fuad Efendi <[EMAIL PROTECTED]> wrote: > Try new Protocol-HTTPClient-Innovation: > http://issues.apache.org/jira/browse/NUTCH-109 > > > -----Original Message----- > From: Daniele Menozzi [mailto:[EMAIL PROTECTED] > Sent: Monday, October 10, 2005 5:42 PM > To: nutch-dev@lucene.apache.org > Subject: Re: Re[2]: what contibute to fetch slowing down > > > On 03:36:45 03/Oct , Michael wrote: > > 3mbit, 100 threads = 15 pages/sec > > cpu is low during fetch, so its bandwidth limit. > > yes, cpu is low, and even memory is quite free. But, with a 10MB > in/out I > cannot obtain good results (and I do not parse results, simply fetch > them). > If I use 100 threads, I can download pages at 500KB/s for about 5 > seconds, > but after that, the download rate falls to 0. If I set 20 threads, I > can > download > at 200KB for 4/5 minutes, and the rate initially seems very stable. > But, > after theese few minutes, the rate starts to get lower and lower, and > tends > to reach zero pages/s. > > I cannot understand what could be the problem. Every thread number I > choose, > the rate _always_ decrease, till it has reached 1/2 pages/s. I;ve > tried 2 > different machines, but the problem is always the same. > > Can you please give me some advices? > Thank you > Daniele > > > > -- > Free Software Enthusiast > Debian Powered Linux User #332564 > http://menoz.homelinux.org > > >