[ http://issues.apache.org/jira/browse/HADOOP-610?page=all ]
Owen O'Malley updated HADOOP-610: --------------------------------- Status: Patch Available (was: Open) > Task Tracker offerService does not adequately protect from exceptions > --------------------------------------------------------------------- > > Key: HADOOP-610 > URL: http://issues.apache.org/jira/browse/HADOOP-610 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.7.1 > Reporter: Owen O'Malley > Assigned To: Owen O'Malley > Fix For: 0.8.0 > > Attachments: lost-tt.patch > > > The TaskTracker's offerService loop doesn't handle exceptions, such as time > outs well and will reset the task tracker. I believe this is the cause of > most of the lost task trackers. The scenario looks like: > 1. an rpc timeout in offerService > 2. the task tracker cleans up (which takes 30 minutes with the task tracker > locked up) > 3. the task tracker is declared lost for not providing its heartbeat -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira