Inconsistency in handling lost trackers upon jobtracker restart
---------------------------------------------------------------
Key: HADOOP-5319
URL: https://issues.apache.org/jira/browse/HADOOP-5319
Project: Hadoop Core
Issue Type: Bug
Components: mapred
Reporter: Amar Kamat
If a tasktracker is lost, the jobtracker kills all the tasks that were
successful on that tracker and re-executes it somewhere else. In-memory
datastructures are all cleared up for the lost tracker. Now if the jobtracker
restarts, the new jobtracker has no clue about the trackers that were lost and
hence if the lost tracker join back, they will be accepted and all the tasks on
those tracker will join back. Following are the issues
- If the running task on the lost tracker is killed, its cleanup attempt will
be launched. Now the new jobtracker has no idea about this attempt. Also the
lost tracker can join back and hence there are 2 attempts that are running with
the same id, one which can move the tip to success and other which moves the
tip to killed state.
- Ideally, the lost tracker should be asked to re-init which wont happen now.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.