Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

xiao yang Sun, 05 Jul 2009 08:33:50 -0700

I often get this error message while crawling the intranet
Is it the network problem? What can I do for it?


$bin/nutch crawl urls -dir crawl -depth 3 -topN 4

crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
topN = 4
Injector: starting
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawl/segments/20090705212324
Generator: filtering: true
Generator: topN: 4
Generator: Partitioning selected urls by host, for politeness.
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
Exception in thread "main" java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
    at org.apache.nutch.crawl.Generator.generate(Generator.java:524)
    at org.apache.nutch.crawl.Generator.generate(Generator.java:409)
    at org.apache.nutch.crawl.Crawl.main(Crawl.java:116)

Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

Reply via email to