What kind of storage is attached to the data nodes ? This kind of error can happen when the CPU is really busy with I/O or interrupts.
Can you run top or dstat on some of the data nodes to see how the system is performing? Raj >________________________________ > From: Sandeep Reddy P <sandeepreddy.3...@gmail.com> >To: common-user@hadoop.apache.org >Sent: Tuesday, May 22, 2012 7:23 AM >Subject: Re: Map/Reduce Tasks Fails > >*Task Trackers* *Name**Host**# running tasks**Max Map Tasks**Max Reduce >Tasks**Task Failures**Directory Failures**Node Health Status**Seconds Since >Node Last Healthy**Total Tasks Since Start* *Succeeded Tasks Since >Start* *Total >Tasks Last Day* *Succeeded Tasks Last Day* *Total Tasks Last Hour* *Succeeded >Tasks Last Hour* *Seconds since heartbeat* >tracker_hadoop2.liaisondevqa.local:localhost/127.0.0.1:56225<http://hadoop2.liaisondevqa.local:50060/> >hadoop2.liaisondevqa.local062220N/A093 60 59 28 64 38 0 >tracker_hadoop4.liaisondevqa.local:localhost/127.0.0.1:40363<http://hadoop4.liaisondevqa.local:50060/> >hadoop4.liaisondevqa.local062190N/A091 59 65 33 36 33 0 >tracker_hadoop5.liaisondevqa.local:localhost/127.0.0.1:46605<http://hadoop5.liaisondevqa.local:50060/> >hadoop5.liaisondevqa.local162210N/A083 47 69 35 45 19 0 >tracker_hadoop3.liaisondevqa.local:localhost/127.0.0.1:37305<http://hadoop3.liaisondevqa.local:50060/> >hadoop3.liaisondevqa.local062180N/A087 55 55 28 57 34 0 Highest Failures: >tracker_hadoop2.liaisondevqa.local:localhost/127.0.0.1:56225 with 22 >failures > > >