[ 
https://issues.apache.org/jira/browse/HADOOP-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated HADOOP-3245:
-------------------------------

    Attachment: HADOOP-3245-v4.1.patch

Attaching a patch that has the following changes
1) Devaraj's 
[comments|https://issues.apache.org/jira/browse/HADOOP-3245?focusedCommentId=12611539#action_12611539]
 are incorporated. Comment # 9 seems complicated and should be dealt in a 
separate jira.

2) Test case incorporated. It does the following
{noformat}
  2.1) Start DFS, MR
  2.2) Submit a job
  2.3) Kill/Close the JT 
  2.4) Restart the JT
  2.5) Check if the job got detected and was successful.
{noformat}

3) What is the effect of JT getting killed on the DFS i.e what happens to the  
incomplete DFS operation started by the JT before getting killed. Also what 
happens after the restart when the JT tries to access the same set of files. 
This requires investigation but as of now I dont see any noticeable effects.

4) Since the child task needs to reset it offset into the 
map-task-completion-events (TT's local copy), I have introduced a new class 
that encapsulates
{noformat}
4.1) An array of map-task-completion-events (if any) as requested by the child 
task
4.2) A boolean which decides whether the child should reset its offset. 
{noformat}

Me, Hemanth and Devaraj had a discussion on this and we feel its better and 
cleaner to do it this way.

5) Some bug fixes.
----
Known issues (summary):
1) Job level updates while the JT is running (like killJob(), priority updates 
etc) will be lost on restart.
2) The job runtime cannot be determined
3) Point #9 of Devaraj's 
[comment|https://issues.apache.org/jira/browse/HADOOP-3245?focusedCommentId=12611539#action_12611539]


> Provide ability to persist running jobs (extend HADOOP-1876)
> ------------------------------------------------------------
>
>                 Key: HADOOP-3245
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3245
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Amar Kamat
>         Attachments: HADOOP-3245-v2.5.patch, HADOOP-3245-v2.6.5.patch, 
> HADOOP-3245-v2.6.9.patch, HADOOP-3245-v4.1.patch
>
>
> This could probably extend the work done in HADOOP-1876. This feature can be 
> applied for things like jobs being able to survive jobtracker restarts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to