RE: Re[2]: what contibute to fetch slowing down

ogjunk-nutch Tue, 11 Oct 2005 21:51:27 -0700

Fuad,

I think you are constantly comparing apples and oranges here.  It looks
like your new code simply hammers the server sending multiple requests
to a single server in parallel.  That's a big no-no in a web
crawling/spidering/fetching world, as bad as not obeying robots.txt.


The fact that the speed difference is SO large is a clear hint that the
comparison may not be right, and that the 3 plugins you are comparing
are configured very differently.

Otis


--- Fuad Efendi <[EMAIL PROTECTED]> wrote:

> Try new Protocol-HTTPClient-Innovation:
> http://issues.apache.org/jira/browse/NUTCH-109
> 
> 
> -----Original Message-----
> From: Daniele Menozzi [mailto:[EMAIL PROTECTED] 
> Sent: Monday, October 10, 2005 5:42 PM
> To: nutch-dev@lucene.apache.org
> Subject: Re: Re[2]: what contibute to fetch slowing down
> 
> 
> On  03:36:45 03/Oct , Michael wrote:
> > 3mbit, 100 threads = 15 pages/sec
> > cpu is low during fetch, so its bandwidth limit.
> 
> yes, cpu is low, and even memory is quite free. But, with a 10MB
> in/out I
> cannot obtain good results (and I do not parse results, simply fetch
> them).
> If I use 100 threads, I can download pages at 500KB/s for about 5
> seconds,
> but after that, the download rate falls to 0. If I set 20 threads, I
> can
> download 
> at 200KB for 4/5 minutes, and the rate initially seems very stable.
> But,
> after theese few minutes, the rate starts to get lower and lower, and
> tends
> to reach zero pages/s.
> 
> I cannot understand what could be the problem. Every thread number I
> choose,
> the rate _always_ decrease, till it has reached 1/2 pages/s. I;ve
> tried 2
> different machines, but the problem is always the same.
> 
> Can you please give me some advices?
> Thank you
>       Daniele
> 
> 
> 
> -- 
>                     Free Software Enthusiast
>                Debian Powered Linux User #332564 
>                    http://menoz.homelinux.org
> 
> 
>

RE: Re[2]: what contibute to fetch slowing down

Reply via email to