[ 
https://issues.apache.org/jira/browse/OOZIE-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353895#comment-16353895
 ] 

Peter Cseh commented on OOZIE-2847:
-----------------------------------

4.3.1 rc4 is already out, so this issue won't be part of that.
If there will be an rc5 I'll make sure this will be part of it. If not, it 
still can be part of 4.3.2 if there will be one.



> Oozie Ha timing issue
> ---------------------
>
>                 Key: OOZIE-2847
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2847
>             Project: Oozie
>          Issue Type: Bug
>          Components: HA
>    Affects Versions: 4.3.0, 5.0.0b1, 5.0.0
>            Reporter: Péter Gergő Barna
>            Assignee: Denes Bodo
>            Priority: Minor
>             Fix For: trunk, 5.0.0
>
>         Attachments: OOZIE-2847-4.3.patch, OOZIE-2847-5.0.patch, 
> OOZIE-2847.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Oozie Ha timing issue
> When Oozie is launching the mapper, it is writing a job id into a file on 
> hdfs. Let's assume the ApplicationMaster is killed, and Oozie will make a 
> second try, during recovery. On the second try, Oozie is trying to see if the 
> previously written job id on hdfs matches the current job id. In most 
> occasion, this will match. However, in the event when Oozie launcher is 
> killed right in the middle when Oozie is in the process of writing id in the 
> file, the Oozie file in hdfs is created, but the id has yet to be written to 
> the file. During the next recovery, Oozie will mistakenly think the id exists 
> in the file while the file is actually empty, therefore throwing this 
> exception: 
> {noformat}
> 2015-07-10 
> 05:56:58,137|beaver.machine|INFO|5208|1344|MainThread|------------------------------------------------------------------------------------------------------------------------------------
> 2015-07-10 05:56:58,137|beaver.machine|INFO|5208|1344|MainThread|Console URL  
>      : http://dal-ha21:8088/proxy/application_1436507526035_0001/
> 2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Error Code   
>      : JA018
> 2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Error 
> Message     : Hadoop job Id mismatch, action file 
> [hdfs://hdp2-ha2/user/hadoopqa/oozie-hado/0000003-150710041341636-oozie-hado-W/pig-node--pig/0000003-150710041341636-oozie-hado-W@pig-node@0]
>  declares Id [null] current Id [job_1436507526035_0001]
> 2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|External ID  
>      : job_1436507526035_0001
> 2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|External 
> Status   : FAILED/KILLED
> 2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Name         
>      : pig-node
> 2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Retries      
>      : 0
> 2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Tracker URI  
>      : dal-ha21:8032
> 2015-07-10 05:56:58,138|beaver.machine|INFO|5208|1344|MainThread|Type         
>      : pig
> 2015-07-10 05:56:58,158|beaver.machine|INFO|5208|1344|MainThread|Started      
>      : 2015-07-10 05:55:19 GMT
> 2015-07-10 05:56:58,160|beaver.machine|INFO|5208|1344|MainThread|Status       
>      : ERROR
> 2015-07-10 05:56:58,161|beaver.machine|INFO|5208|1344|MainThread|Ended        
>      : 2015-07-10 05:56:42 GMT
> 2015-07-10 05:56:58,161|beaver.machine|INFO|5208|1344|MainThread|External 
> Stats    : null
> 2015-07-10 05:56:58,161|beaver.machine|INFO|5208|1344|MainThread|External 
> ChildIDs : null
> 2015-07-10 
> 05:56:58,161|beaver.machine|INFO|5208|1344|MainThread|------------------------------------------------------------------------------------------------------------------------------------
> Exception:
> 2015-07-10 05:56:18,658 INFO [main] 
> org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file 
> system [hdfs://hdp2-ha2:8020]
> 2015-07-10 05:56:18,665 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous history file is at 
> hdfs://hdp2-ha2:8020/user/hadoopqa/.staging/job_1436507526035_0001/job_1436507526035_0001_1.jhist
> 2015-07-10 05:56:18,693 WARN [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Unable to parse prior job 
> history, aborting recovery
> java.io.IOException: Incompatible event log version: null
>       at 
> org.apache.hadoop.mapreduce.jobhistory.EventReader.<init>(EventReader.java:71)
>       at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.parse(JobHistoryParser.java:139)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.parsePreviousJobHistory(MRAppMaster.java:1206)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.processRecovery(MRAppMaster.java:1175)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1039)
>       at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1519)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515)
>       at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448)
> 2015-07-10 05:56:18,737 INFO [main] 
> org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils: Default file 
> system [hdfs://hdp2-ha2:8020]
> 2015-07-10 05:56:18,745 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Previous history file is at 
> hdfs://hdp2-ha2:8020/user/hadoopqa/.staging/job_1436507526035_0001/job_1436507526035_0001_1.jhist
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to