[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-5466:
-----------------------------------------------

    Status: Open  (was: Patch Available)

The analysis looks good to me.

Essentially, we are no longer letting AMs which are erring out anyways to write 
history file. The patch needs more changes though
 - In case of rebooting AMs, we should skip writing history files except for 
the last AM attempt.
 - Even jobs failing because of an ERROR event should behave the same. All AMs 
except the last retry should skip writing history file.
 - In TestJobHistoryEventHandler, make sure that processDoneFilesCalled is true 
for successful/failed/killed jobs.

Also, please do some manual testing and report results. It shouldn't be too 
difficult to reproduce this with and without the patch.


Thanks!
                
> Historyserver does not refresh the result of restarted jobs after RM restart
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5466
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5466
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: yeshavora
>            Assignee: Jian He
>         Attachments: MAPREDUCE-5466.patch
>
>
> Restart RM when sort job is running and verify that the job passes 
> successfully after RM restarts. 
> Once the job finishes successfully, run job status command for sort job. It 
> shows "Job state =FAILED". Job history server does not update the result for 
> the job which restarted after RM restart.
> hadoop job -status job_1375923346354_0003
> 13/08/08 01:24:13 INFO mapred.ClientServiceDelegate: Application state is 
> completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
> Job: job_1375923346354_0003
> Job File: 
> hdfs://host1:port1/history/done/2013/08/08/000000/job_1375923346354_0003_conf.xml
> Job Tracking URL : 
> http://historyserver:port2/jobhistory/job/job_1375923346354_0003
> Uber job : false
> Number of maps: 80
> Number of reduces: 1
> map() completion: 0.0
> reduce() completion: 0.0
> Job state: FAILED
> retired: false
> reason for failure: There are no failed tasks for the job. Job is failed due 
> to some other reason and reason can be found in the logs.
> Counters not available. Job is retired.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to