Task Tracker offerService does not adequately protect from exceptions
---------------------------------------------------------------------
Key: HADOOP-610
URL: http://issues.apache.org/jira/browse/HADOOP-610
Project: Hadoop
Issue Type: Bug
Components: mapred
Affects Versions: 0.7.1
Reporter: Owen O'Malley
Assigned To: Owen O'Malley
Fix For: 0.8.0
The TaskTracker's offerService loop doesn't handle exceptions, such as time
outs well and will reset the task tracker. I believe this is the cause of most
of the lost task trackers. The scenario looks like:
1. an rpc timeout in offerService
2. the task tracker cleans up (which takes 30 minutes with the task tracker
locked up)
3. the task tracker is declared lost for not providing its heartbeat
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira