Hello,

I am new with Nutch and I have set up Nutch 0.9 on Easy Eclipse for Mac OS X. When I try to start crawling I get the following exception:

Dedup: starting
Dedup: adding indexes in: crawl/indexes
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)


Does anyone know how to solve this problem?

You can get an IOException reported by Hadoop when the root cause is that you've run out of memory. Normally the hadoop.log file would have the OOM exception.

If you're running from inside of Eclipse, see http://wiki.apache.org/nutch/RunNutchInEclipse0.9 for more details.

-- Ken
--
Ken Krugler
+1 530-210-6378

Reply via email to