[ http://issues.apache.org/jira/browse/HADOOP-654?page=comments#action_12445697 ] eric baldeschwieler commented on HADOOP-654: --------------------------------------------
# failures should be visible by node on the job tracker UI. Nodes that are not getting jobs should be highlighted on the UI. > jobs fail with some hardware/system failures on a small number of nodes > ----------------------------------------------------------------------- > > Key: HADOOP-654 > URL: http://issues.apache.org/jira/browse/HADOOP-654 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.7.2 > Reporter: Yoram Arnon > Assigned To: Owen O'Malley > Priority: Minor > > occasionally, such as when the OS is out of some resource, a node fails only > partly. The node is up and running, the task tracker is running and sending > heartbeats, but every task fails because the tasktracker can't fork tasks or > something. > In these cases, that task tracker keeps getting assigned tasks to execute, > and they all fail. > A couple of nodes like that and jobs start failing badly. > The job tracker should avoid assigning tasks to tasktrackers that are > misbehaving. > simple approach: avoid tasktrackers that report many more failures than > average (say 3X). Simply use the info sent by the TT. > better but harder: track TT failures over time and: > 1. avoid those that exhibit a high failure *rate* > 2. tell them to shut down -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira