Jason Camp wrote:
Hi,
I'm trying to gage whether one crawl server is performing well, and I'm
having a tough time trying to determine if I could increase settings to
gain faster crawls, or if I'm approaching the max the server can handle.
The server is a dual AMD Althon 2200 with 2GB of ram hanging off of a
dedicated 10Mb connection. When processing 1 million url segment, I see
these speeds in the log:
281147 pages, 142413 errors, 11.4 pages/s, 1918 kb/s,
What do you have the "fetcher.threads.fetch" value set to (in
nutch-site.xml)?
You may also want to ensure you are using good values for
http.max.delays and http.timeout.
With a similar machine I am able to pull ~30 pages/sec using the
following settings:
- http.max.delays 5
- http.timeout 5000
- fetcher.threads.fetch 256
HTH,
-Shawn
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general