[ 
https://issues.apache.org/jira/browse/OOZIE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069448#comment-14069448
 ] 

Robert Kanter commented on OOZIE-1938:
--------------------------------------

The stuff you said about :sync: sounds about right from what I've observed.  I 
also agree that we should try to leave the fork code alone as much as possible; 
it gets very tricky as I found out while working on OOZIE-1879.  If you can 
easily get the RecoveryService to handle this, I'd vote for that.

> Fork-join job does not execute join node sometimes during HA failover
> ---------------------------------------------------------------------
>
>                 Key: OOZIE-1938
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1938
>             Project: Oozie
>          Issue Type: Bug
>          Components: HA
>    Affects Versions: trunk
>            Reporter: Mona Chitnis
>             Fix For: trunk
>
>
> Reported by [~mchiang].
> Scenario: (2 Oozie HA servers)
> 21:38:56 submit job at oozie client
> 21:41:42 shut down server1
> 21:46:52 shut down server2
> 21:47:30 start server1
> 22:15:05 start server2
> the last fork path end time is 21:52:53.
> 22:36:48 the job is still RUNNING, not moving to join node.
> Digging into the logs, the locking part seems to work fine with forked action 
> processing distributed amongst the two servers when both running or when one 
> of them is down. The issue seems to be why even RecoveryService fails to pick 
> up the job after all the forks had completed



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to