-Original Message-
From: Daniele Menozzi [mailto:[EMAIL PROTECTED]
Sent: Monday, October 10, 2005 5:42 PM
To: nutch-dev@lucene.apache.org
Subject: Re: Re[2]: what contibute to fetch slowing down
On 03:36:45 03/Oct , Michael wrote:
3mbit, 100 threads = 15 pages/sec
cpu is low
On 03:36:45 03/Oct , Michael wrote:
3mbit, 100 threads = 15 pages/sec
cpu is low during fetch, so its bandwidth limit.
yes, cpu is low, and even memory is quite free. But, with a 10MB in/out
I cannot obtain good results (and I do not parse results, simply fetch
them).
If I use 100 threads, I
On 09:59:45 03/Oct , Doug Cutting wrote:
I suspect threads are hanging, probably in the parser,
I tried to not parse, but without good results.
If I use 100 threads, I can download pages at 500KB/s for about 5 seconds,
but after that, the download rate falls to 0. If I set 20 threads, I can
Fuad Efendi wrote:
I found this in J2SE API for setReuseAddress(default: false):
=
When a TCP connection is closed the connection may remain in a timeout
state for a period of time after the connection is closed (typically
known as the TIME_WAIT state or 2MSL wait state). For applications
Socket.close()... But we need to perform real tests anyway.
-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Monday, October 03, 2005 1:05 PM
To: nutch-dev@lucene.apache.org
Subject: Re: what contibute to fetch slowing down
Fuad Efendi wrote:
If I am right, we are simply
not only for us but
also for Production Web Sites.
Thanks,
Fuad
-Original Message-
From: Fuad Efendi [mailto:[EMAIL PROTECTED]
Sent: Friday, September 30, 2005 10:58 PM
To: nutch-dev@lucene.apache.org; [EMAIL PROTECTED]
Subject: RE: what contibute to fetch slowing down
Dear Nutchers,
I
Update on fetch performance of my current run: download speed has been
stable at 3.8 pages/sec, about 640kbps. This is probably limited by my
bandwidth - regular DSL service, promising up to 1.5 mbps inbound but
realistically only 640 kbps.
More than 1 million pages were fetched, but it took
but
also for Production Web Sites.
Thanks,
Fuad
-Original Message-
From: Fuad Efendi [mailto:[EMAIL PROTECTED]
Sent: Friday, September 30, 2005 10:58 PM
To: nutch-dev@lucene.apache.org; [EMAIL PROTECTED]
Subject: RE: what contibute to fetch slowing down
Dear Nutchers
Update on fetch performance of my current run: download speed has been
stable at 3.8 pages/sec, about 640kbps. This is probably limited by my
bandwidth - regular DSL service, promising up to 1.5 mbps inbound but
realistically only 640 kbps.
More than 1 million pages were fetched, but it took
Correction to my previous post. I'd said:
When you use the FetchListTool to emit multiple lists, it
intentionally divides up the list using the MD5 value for the link,
so that you get hosts scattered between the lists. But for a single
list, this doesn't happen, and thus the max threads/host
, Nutch needs few days. 8mbps/800kbps, download/upload.
-Original Message-
From: Michael Ji [mailto:[EMAIL PROTECTED]
Sent: Sunday, October 02, 2005 5:37 PM
To: nutch-dev@lucene.apache.org
Subject: RE: what contibute to fetch slowing down
Kelvin's OC implementation is queuing fetching
://grinder.sourceforge.net - very simple Java based proxy)
-Original Message-
From: Michael Ji [mailto:[EMAIL PROTECTED]
Sent: Sunday, October 02, 2005 5:37 PM
To: nutch-dev@lucene.apache.org
Subject: RE: what contibute to fetch slowing down
Kelvin's OC implementation is queuing fetching
? Should I
create a new bug report at JIRA?
SUN's Socket, Apache's HttpClient, UNIX's networking...
-Original Message-
From: Daniele Menozzi [mailto:[EMAIL PROTECTED]
Sent: Wednesday, September 28, 2005 4:42 PM
To: nutch-dev@lucene.apache.org
Subject: Re: what contibute to fetch slowing down
I started the crawler with about 2000 sites. The fetcher could achieve
7 pages/sec initially, but the performance gradually dropped to about 2
pages/sec, sometimes even 0.5 pages/sec. The fetch list had 300k pages
and I used 500 threads. What are the main causes of this slowing down?
Below
Hi AJ
I guess the growing of thread.
You can show the thread id in the log. I think it makes sence
Regards
/Jack
On 9/29/05, AJ Chen [EMAIL PROTECTED] wrote:
I started the crawler with about 2000 sites. The fetcher could achieve
7 pages/sec initially, but the performance gradually dropped to
On 10:27:55 28/Sep , AJ Chen wrote:
I started the crawler with about 2000 sites. The fetcher could achieve
7 pages/sec initially, but the performance gradually dropped to about 2
pages/sec, sometimes even 0.5 pages/sec. The fetch list had 300k pages
and I used 500 threads. What are the
16 matches
Mail list logo