Iam trying to crawl a seed list of 5000. it was working fine, but at the end of fetching at depth 1 the process failed showing message like this. can any one suggest what may be problem..
attempt_201003311259_0003_m_000003_2: fetching http://www.law.louisville.edu/news-events/admissions/feed attempt_201003311259_0003_m_000003_2: -activeThreads=100, spinWaiting=0, fetchQueues.totalSize=4993 attempt_201003311259_0003_m_000003_2: fetch of http://dualdegree.seas.wustl.edu/ failed with: java.net.UnknownHostException: dualdegree.seas.wustl.edu attempt_201003311259_0003_m_000003_2: fetching http://www.couchsurfing.org/ambassador.html attempt_201003311259_0003_m_000003_2: fetching http://www.niu.edu/northerntoday/contact.shtml attempt_201003311259_0003_m_000003_2: fetching http://www.spertusshop.org/gifts-for-weddings-c-126.html Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:969) at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1003) runbot: fetch 20100331130047 at depth 1 failed. runbot: Deleting segment 20100331130047. --- Beginning crawl at depth 2 of 3 --- Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: crawled/segments/20100331165040 Generator: filtering: true Generator: topN: 800000 Generator: Partitioning selected urls by host, for politeness. Generator: done. Fetcher: starting Fetcher: segment: crawled/segments/20100331165040 -- View this message in context: http://n3.nabble.com/Problem-at-the-end-of-fetching-tp688124p688124.html Sent from the Nutch - User mailing list archive at Nabble.com.