RE: what contibute to fetch slowing down

2005-10-02 Thread Fuad Efendi
Some suggestion to improve performance: 1. Decrease randomization of FetchList. Here is comment from FetchListTool: /** * The TableSet class will allocate a given FetchListEntry * into one of several ArrayFiles. It chooses which * ArrayFile based on a hash of the URL's

How can I unsubscribe from the mailing list?

2005-10-02 Thread nimakh
Does any body know how I can unsubscribe from this mailing list? Thanks, Nima

Re: what contibute to fetch slowing down

2005-10-02 Thread AJ Chen
Update on fetch performance of my current run: download speed has been stable at 3.8 pages/sec, about 640kbps. This is probably limited by my bandwidth - regular DSL service, promising up to 1.5 mbps inbound but realistically only 640 kbps. More than 1 million pages were fetched, but it took

Re: How can I unsubscribe from the mailing list?

2005-10-02 Thread Michael Ji
http://lucene.apache.org/nutch/mailing_lists.html --- [EMAIL PROTECTED] wrote: Does any body know how I can unsubscribe from this mailing list? Thanks, Nima __ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com

RE: what contibute to fetch slowing down

2005-10-02 Thread Michael Ji
Kelvin's OC implementation is queuing fetching request according to the host and using http 1.1 protocol. It is a nutch patch currently. Michael Ji, --- Fuad Efendi [EMAIL PROTECTED] wrote: Some suggestion to improve performance: 1. Decrease randomization of FetchList. Here is comment

Re: what contibute to fetch slowing down

2005-10-02 Thread Ken Krugler
Update on fetch performance of my current run: download speed has been stable at 3.8 pages/sec, about 640kbps. This is probably limited by my bandwidth - regular DSL service, promising up to 1.5 mbps inbound but realistically only 640 kbps. More than 1 million pages were fetched, but it took

Re: what contibute to fetch slowing down

2005-10-02 Thread Ken Krugler
Correction to my previous post. I'd said: When you use the FetchListTool to emit multiple lists, it intentionally divides up the list using the MD5 value for the link, so that you get hosts scattered between the lists. But for a single list, this doesn't happen, and thus the max threads/host

RE: what contibute to fetch slowing down

2005-10-02 Thread Fuad Efendi
Unfortunately this is commented in Kelvin's code: // reqStr.append(Connection: Keep-Alive\r\n); I found only reqStr.append( HTTP/1.1\r\n); - but it does not mean implementation of HTTP/1.1 features. Teleport Ultra v.1.29 needs just a few hours to download all plain HTML from SUN,

RE: what contibute to fetch slowing down

2005-10-02 Thread Fuad Efendi
I never tried Kelvin's OC, I only browsed source code a little. We need to make test with JVM 1.4, and JVM 1.5 (Kelvin's OC). If I am right, we are simply _killing_ many many sites with default Apache HTTPD installation (Microsoft IIS, etc.) (150 keep-alive client threads; I configured 6000

java.net.MalformedURLException: no protocol for parse-plugins.xml

2005-10-02 Thread Earl Cahill
I did a clean, full svn update, and ant on trunk, then tried bin/nutch crawl urls -dir crawl.test and got 051002 224950 SEVERE Unable to load parse plugins file from URL [parse-plugins.xml] java.net.MalformedURLException: no protocol: ... Likely missing file:/. If I get rid of lines 617-622