[ https://issues.apache.org/jira/browse/MAPREDUCE-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232825#comment-13232825 ]
Robert Joseph Evans commented on MAPREDUCE-4033: ------------------------------------------------ I am a bit confused about how this could be happening. The Job History server code should be able to report back for any job that has a jhist file and conf.xml in the intermediate done directory for the given user. These files should be in the proper directory before the AM exits. If the AM is has not exited yet the client should direct all status requests to the AM, if it is no longer active then the requests should go to the History Server. How reproducible is this issue? I know that there are some serious race conditions in the History Server that I am working on fixing, and it could be related to that. MAPREDUCE-3972 > time lag between job completion and job beng avail in JH server makes Oozie > fail > -------------------------------------------------------------------------------- > > Key: MAPREDUCE-4033 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4033 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 0.23.3 > Reporter: Alejandro Abdelnur > Priority: Critical > Fix For: 0.23.3 > > > Oozie testcases are failing randomly because MR2 reports the job as unknown. > This seems to happen when Oozie queries via JobClient.getJob(<JOBID>) for a > <JOBID> that just finished. > {code} > org.apache.oozie.action.ActionExecutorException: JA017: Unknown hadoop job > [job_1332176678205_0011] associated with action > [0000000-120319101023910-oozie-tucu-W@pig-action]. Failing this action! > {code} > Oozie reports this error when JobClient.getJob(<JOBID>) returns NULL. > Looking at the mini cluster logs the job definitely run. > {code} > find . -name "*1332176678205_0011*" > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_0/application_1332176678205_0011 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_0/application_1332176678205_0011/container_1332176678205_0011_01_000002 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_0/application_1332176678205_0011/container_1332176678205_0011_01_000001 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_2/application_1332176678205_0011 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_2/application_1332176678205_0011/container_1332176678205_0011_01_000002 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_2/application_1332176678205_0011/container_1332176678205_0011_01_000001 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1332176678205_0011 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1332176678205_0011/container_1332176678205_0011_01_000002 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1332176678205_0011/container_1332176678205_0011_01_000001 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_1/application_1332176678205_0011 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_1/application_1332176678205_0011/container_1332176678205_0011_01_000002 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_1/application_1332176678205_0011/container_1332176678205_0011_01_000001 > {code} > It seems there is a gap until the the job is avail in the JH server. > If this gap is unavoidable we need to ensure Oozie always waits at least the > gap time before querying for a job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira