Problem at the end of fetching

hareesh Wed, 31 Mar 2010 03:55:27 -0700

Iam trying to crawl a seed list of 5000. it was working fine, but at the end
of fetching at depth 1 the process failed showing message like this. can any
one suggest what may be problem..


attempt_201003311259_0003_m_000003_2: fetching
http://www.law.louisville.edu/news-events/admissions/feed
attempt_201003311259_0003_m_000003_2: -activeThreads=100, spinWaiting=0,
fetchQueues.totalSize=4993
attempt_201003311259_0003_m_000003_2: fetch of
http://dualdegree.seas.wustl.edu/ failed with:
java.net.UnknownHostException: dualdegree.seas.wustl.edu
attempt_201003311259_0003_m_000003_2: fetching
http://www.couchsurfing.org/ambassador.html
attempt_201003311259_0003_m_000003_2: fetching
http://www.niu.edu/northerntoday/contact.shtml
attempt_201003311259_0003_m_000003_2: fetching
http://www.spertusshop.org/gifts-for-weddings-c-126.html
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:969)
        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1003)
runbot: fetch 20100331130047 at depth 1 failed.
runbot: Deleting segment 20100331130047.
--- Beginning crawl at depth 2 of 3 ---
Generator: Selecting best-scoring urls due for fetch.
Generator: starting
Generator: segment: crawled/segments/20100331165040
Generator: filtering: true
Generator: topN: 800000
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: crawled/segments/20100331165040

-- 
View this message in context: 
http://n3.nabble.com/Problem-at-the-end-of-fetching-tp688124p688124.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Problem at the end of fetching

Reply via email to