Hi,

I've been using the latest trunk version on 3 machines, 3 tasktracker and
80000 starting URLs, I've been trying to have depth 2 crawl. The first loop
always goes well. But the second loop (which has about 800,000 ursl to fetch
in the fetch list) always fails in the middle or end of the Fetcher reduce
process with this error:

060124 073930  reduce 48%
060124 074000  reduce 49%
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:347)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:111)

sometimes it happends when reduce is at 100%

And these are my settings:

mapred.map.tasks=20
mapred.reduce.tasks=10

It seems this exception happens when fetchlist grows and size of the mapred
folder is large. Can it be because the number of reduce tasks is more than
number of tasktracker? It also happens with single machine and one
tasktracker.

Thanks Mike

Reply via email to