Hi,
I've been running sequential map and reduce with sufficient storage for
mapred.local.dir and the HDFS (the storage for each of these is at least 50
Gigs on each node and there are 30 nodes). When the expected output from one of
the map-reduce jobs was close to 20 GB, the jobs failed with the message: All
datanodes are bad. (Unfortunately I can't find the detailed logs for that ).
I restarted the jobs after that happened and after a while the tasks now fail
with this message : Too many fetch-failures.
I know this is pretty limited in terms of the logs but if anyone can point out
whether they've seen something similar and how they rectified it, would be
great.
thanks
H
Morpheus: Do you believe in fate, Neo?
Neo: No.
Morpheus: Why Not?
Neo: Because I don't like the idea that I'm not in control of my life.