[
https://issues.apache.org/jira/browse/OOZIE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069448#comment-14069448
]
Robert Kanter commented on OOZIE-1938:
--------------------------------------
The stuff you said about :sync: sounds about right from what I've observed. I
also agree that we should try to leave the fork code alone as much as possible;
it gets very tricky as I found out while working on OOZIE-1879. If you can
easily get the RecoveryService to handle this, I'd vote for that.
> Fork-join job does not execute join node sometimes during HA failover
> ---------------------------------------------------------------------
>
> Key: OOZIE-1938
> URL: https://issues.apache.org/jira/browse/OOZIE-1938
> Project: Oozie
> Issue Type: Bug
> Components: HA
> Affects Versions: trunk
> Reporter: Mona Chitnis
> Fix For: trunk
>
>
> Reported by [~mchiang].
> Scenario: (2 Oozie HA servers)
> 21:38:56 submit job at oozie client
> 21:41:42 shut down server1
> 21:46:52 shut down server2
> 21:47:30 start server1
> 22:15:05 start server2
> the last fork path end time is 21:52:53.
> 22:36:48 the job is still RUNNING, not moving to join node.
> Digging into the logs, the locking part seems to work fine with forked action
> processing distributed amongst the two servers when both running or when one
> of them is down. The issue seems to be why even RecoveryService fails to pick
> up the job after all the forks had completed
--
This message was sent by Atlassian JIRA
(v6.2#6252)