[ https://issues.apache.org/jira/browse/MAPREDUCE-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241865#comment-13241865 ]
Alejandro Abdelnur commented on MAPREDUCE-4033: ----------------------------------------------- Thxs Robert, seems like the problem is in the MiniMRClientClusterFactory initialization, digging. I'll take care of this. Now the follow up question, why things go into a hang mode and make my testcase to timeout? That seems wrong. > time lag between job completion and job being avail in JH server makes Oozie > fail > --------------------------------------------------------------------------------- > > Key: MAPREDUCE-4033 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4033 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 0.23.3 > Reporter: Alejandro Abdelnur > Priority: Critical > Fix For: 2.0.0 > > Attachments: minicluster-oozie-pig.txt > > > Oozie testcases are failing randomly because MR2 reports the job as unknown. > This seems to happen when Oozie queries via JobClient.getJob(<JOBID>) for a > <JOBID> that just finished. > {code} > org.apache.oozie.action.ActionExecutorException: JA017: Unknown hadoop job > [job_1332176678205_0011] associated with action > [0000000-120319101023910-oozie-tucu-W@pig-action]. Failing this action! > {code} > Oozie reports this error when JobClient.getJob(<JOBID>) returns NULL. > Looking at the mini cluster logs the job definitely run. > {code} > find . -name "*1332176678205_0011*" > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_0/application_1332176678205_0011 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_0/application_1332176678205_0011/container_1332176678205_0011_01_000002 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_0/application_1332176678205_0011/container_1332176678205_0011_01_000001 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_2/application_1332176678205_0011 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_2/application_1332176678205_0011/container_1332176678205_0011_01_000002 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_2/application_1332176678205_0011/container_1332176678205_0011_01_000001 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1332176678205_0011 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1332176678205_0011/container_1332176678205_0011_01_000002 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1332176678205_0011/container_1332176678205_0011_01_000001 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_1/application_1332176678205_0011 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_1/application_1332176678205_0011/container_1332176678205_0011_01_000002 > ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_1/application_1332176678205_0011/container_1332176678205_0011_01_000001 > {code} > It seems there is a gap until the the job is avail in the JH server. > If this gap is unavoidable we need to ensure Oozie always waits at least the > gap time before querying for a job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira