All, I am getting the following errors during my MR jobs (see below). Ultimately the jobs finish well enough, but these errors do slow things down. I've done some reading and I understand that this is all caused by failures in my network. Is there a way of determining which node(s) in my cluster are causing the problem?
Thanks 11/07/18 14:53:06 INFO mapreduce.Job: map 99% reduce 28% 11/07/18 14:53:10 INFO mapreduce.Job: map 100% reduce 28% 11/07/18 14:53:15 INFO mapreduce.Job: Task Id : attempt_201107180916_0030_m_000003_0, Status : FAILED Too many fetch-failures 11/07/18 14:53:15 WARN mapreduce.Job: Error reading task outputhttp://localhost:50060/tasklog?plaintext=true&attemptid=attempt_201107180916_0030_m_000003_0&filter=stdout 11/07/18 14:53:15 WARN mapreduce.Job: Error reading task outputhttp://localhost:50060/tasklog?plaintext=true&attemptid=attempt_201107180916_0030_m_000003_0&filter=stderr 11/07/18 14:53:17 INFO mapreduce.Job: map 100% reduce 29% 11/07/18 14:53:19 INFO mapreduce.Job: map 96% reduce 29% 11/07/18 14:53:25 INFO mapreduce.Job: map 98% reduce 29% -- Geoffry Roberts