Hi Paul,
Yes, Nutch does in fact slow down the crawl to be considered "polite".
wget doesn't.
If the sites you are crawling are under your control, or you have an
understanding with the site ops people, then you can alter Nutch's
default settings to make it run at near full speed.
-- Ken
On Aug 26, 2009, at 6:55am, Paul Tomblin wrote:
I'm trying to crawl three tiny little sites with Nutch, and it takes
45 minutes. To copy the same files to my local hard drive using wget
takes between 35 seconds and a minute. What is Nutch doing that
causes it to take 45 times as long?
--
http://www.linkedin.com/in/paultomblin
--------------------------
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-210-6378