Péter Gergő Barna created OOZIE-2847:
----------------------------------------

             Summary: Oozie Ha timing issue
                 Key: OOZIE-2847
                 URL: https://issues.apache.org/jira/browse/OOZIE-2847
             Project: Oozie
          Issue Type: Bug
          Components: HA
    Affects Versions: 4.3.0
            Reporter: Péter Gergő Barna
            Priority: Minor


Oozie Ha timing issue

When Oozie is launching the mapper, it is writing a job id into a file on hdfs. 
Let's assume the ApplicationMaster is killed, and Oozie will make a second try, 
during recovery. On the second try, Oozie is trying to see if the previously 
written job id on hdfs matches the current job id. In most occasion, this will 
match. However, in the event when Oozie launcher is killed right in the middle 
when Oozie is in the process of writing id in the file, the Oozie file in hdfs 
is created, but the id has yet to be written to the file. During the next 
recovery, Oozie will mistakenly think the id exists in the file while the file 
is actually empty, therefore throwing this exception: 

{{monospaced}}
2015-07-10 
05:56:58,137|beaver.machine|INFO|5208|1344|MainThread|------------------------------------------------------------------------------------------------------------------------------------
2015-07-10 05:56:58,137|beaver.machine|INFO|5208|1344|MainThread|Console URL    
   : http://dal-ha21:8088/proxy/application_1436507526035_0001/
2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Error Code     
   : JA018
2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Error Message  
   : Hadoop job Id mismatch, action file 
[hdfs://hdp2-ha2/user/hadoopqa/oozie-hado/0000003-150710041341636-oozie-hado-W/pig-node--pig/0000003-150710041341636-oozie-hado-W@pig-node@0]
 declares Id [null] current Id [job_1436507526035_0001]
2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|External ID    
   : job_1436507526035_0001
2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|External 
Status   : FAILED/KILLED
2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Name           
   : pig-node
2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Retries        
   : 0
2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Tracker URI    
   : dal-ha21:8032
2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Type           
   : pig
2015-07-10 05:56:58,158|beaver.machine|INFO|5208|1344|MainThread|Started        
   : 2015-07-10 05:55:19 GMT
2015-07-10 05:56:58,160|beaver.machine|INFO|5208|1344|MainThread|Status         
   : ERROR
2015-07-10 05:56:58,161|beaver.machine|INFO|5208|1344|MainThread|Ended          
   : 2015-07-10 05:56:42 GMT
2015-07-10 05:56:58,161|beaver.machine|INFO|5208|1344|MainThread|External Stats 
   : null
2015-07-10 05:56:58,161|beaver.machine|INFO|5208|1344|MainThread|External 
ChildIDs : null
2015-07-10 
05:56:58,161|beaver.machine|INFO|5208|1344|MainThread|------------------------------------------------------------------------------------------------------------------------------------
Exception:
2015-07-10 05:56:18,658 INFO [main] 
org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system 
[hdfs://hdp2-ha2:8020]
2015-07-10 05:56:18,665 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous history file is at 
hdfs://hdp2-ha2:8020/user/hadoopqa/.staging/job_1436507526035_0001/job_1436507526035_0001_1.jhist
2015-07-10 05:56:18,693 WARN [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Unable to parse prior job 
history, aborting recovery
java.io.IOException: Incompatible event log version: null
        at 
org.apache.hadoop.mapreduce.jobhistory.EventReader.<init>(EventReader.java:71)
        at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:139)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.parsePreviousJobHistory(MRAppMaster.java:1206)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.processRecovery(MRAppMaster.java:1175)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1039)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1519)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515)
        at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448)
2015-07-10 05:56:18,737 INFO [main] 
org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file system 
[hdfs://hdp2-ha2:8020]
2015-07-10 05:56:18,745 INFO [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous history file is at 
hdfs://hdp2-ha2:8020/user/hadoopqa/.staging/job_1436507526035_0001/job_1436507526035_0001_1.jhist
{{monospaced}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to