Inconsistency in handling lost trackers upon jobtracker restart
---------------------------------------------------------------

                 Key: HADOOP-5319
                 URL: https://issues.apache.org/jira/browse/HADOOP-5319
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
            Reporter: Amar Kamat


If a tasktracker is lost, the jobtracker kills all the tasks that were 
successful on that tracker and re-executes it somewhere else. In-memory 
datastructures are all cleared up for the lost tracker. Now if the jobtracker 
restarts, the new jobtracker has no clue about the trackers that were lost and 
hence if the lost tracker join back, they will be accepted and all the tasks on 
those tracker will join back. Following are the issues
- If the running task on the lost tracker is killed, its cleanup attempt will 
be launched. Now the new jobtracker has no idea about this attempt. Also the 
lost tracker can join back and hence there are 2 attempts that are running with 
the same id, one which can move the tip to success and other which moves the 
tip to killed state.
- Ideally, the lost tracker should be asked to re-init which wont happen now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to