[
https://issues.apache.org/jira/browse/OOZIE-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514069#comment-14514069
]
Shwetha G S commented on OOZIE-1993:
------------------------------------
[~puru],
I don't think this issue is because of OOZIE-1879 patch. Its because of the way
executionPaths is implemented in LiteWorkflowInstance - both hdfs_1 and
hadoop_streaming_1 end up with same execution path /hdfs_1/ which causes issue
during re-run.
{noformat}
2015-04-27 18:04:09,852 DEBUG LiteWorkflowInstance[pool-5-thread-8] -
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf]
JOB[0000000-150427180330562-oozie-sshi-W]
ACTION[0000000-150427180330562-oozie-sshi-W@fork1] Signaling job node [fork1]
execution path [/] signal value [OK]
2015-04-27 18:04:09,857 DEBUG LiteWorkflowInstance[pool-5-thread-8] -
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf]
JOB[0000000-150427180330562-oozie-sshi-W]
ACTION[0000000-150427180330562-oozie-sshi-W@fork1] Signaling job node [java1]
execution path [/java1/] signal value [::synch::]
2015-04-27 18:04:09,861 DEBUG LiteWorkflowInstance[pool-5-thread-8] -
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf]
JOB[0000000-150427180330562-oozie-sshi-W]
ACTION[0000000-150427180330562-oozie-sshi-W@fork1] Signaling job node [hdfs1]
execution path [/hdfs1/] signal value [::synch::]
2015-04-27 18:04:36,323 DEBUG LiteWorkflowInstance[pool-5-thread-8] -
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf]
JOB[0000000-150427180330562-oozie-sshi-W]
ACTION[0000000-150427180330562-oozie-sshi-W@hdfs1] Signaling job node [hdfs1]
execution path [/hdfs1/] signal value [OK]
2015-04-27 18:07:48,514 DEBUG LiteWorkflowInstance[pool-5-thread-8] -
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf]
JOB[0000000-150427180330562-oozie-sshi-W]
ACTION[0000000-150427180330562-oozie-sshi-W@hdfs1] Signaling job node [stream1]
execution path [/hdfs1/] signal value [::synch::]
2015-04-27 18:07:49,164 DEBUG LiteWorkflowInstance[pool-5-thread-5] -
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf]
JOB[0000000-150427180330562-oozie-sshi-W]
ACTION[0000000-150427180330562-oozie-sshi-W@java1] Signaling job node [java1]
execution path [/java1/] signal value [OK]
2015-04-27 18:07:49,164 DEBUG LiteWorkflowInstance[pool-5-thread-5] -
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf]
JOB[0000000-150427180330562-oozie-sshi-W]
ACTION[0000000-150427180330562-oozie-sshi-W@java1] Signaling job node [join1]
execution path [/java1/] signal value [::synch::]
2015-04-27 18:08:06,926 DEBUG LiteWorkflowInstance[pool-5-thread-9] -
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf]
JOB[0000000-150427180330562-oozie-sshi-W]
ACTION[0000000-150427180330562-oozie-sshi-W@stream1] Signaling job node
[stream1] execution path [/hdfs1/] signal value [OK]
2015-04-27 18:08:06,926 DEBUG LiteWorkflowInstance[pool-5-thread-9] -
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf]
JOB[0000000-150427180330562-oozie-sshi-W]
ACTION[0000000-150427180330562-oozie-sshi-W@stream1] Signaling job node [join1]
execution path [/hdfs1/] signal value [::synch::]
2015-04-27 18:08:06,970 DEBUG LiteWorkflowInstance[pool-5-thread-9] -
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf]
JOB[0000000-150427180330562-oozie-sshi-W]
ACTION[0000000-150427180330562-oozie-sshi-W@join1] Signaling job node [join1]
execution path [/hdfs1/] signal value [OK]
{noformat}
Does this make sense? I am not familiar with workflow execution code. So, just
want to make sure I am not missing anything.
> Rerun fails during join in certain condition
> --------------------------------------------
>
> Key: OOZIE-1993
> URL: https://issues.apache.org/jira/browse/OOZIE-1993
> Project: Oozie
> Issue Type: Bug
> Components: core
> Affects Versions: trunk, 4.1.0
> Reporter: Robert Kanter
> Fix For: trunk
>
> Attachments: oozie-log.txt, worflow.xml
>
>
> As [~puru] described in [this
> comment|https://issues.apache.org/jira/browse/OOZIE-1989?focusedCommentId=14123509&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14123509]
> in OOZIE-1989:
> {quote}
> At first run job first fails at pig action. When we rerun from failed node,
> oozie.wf.rerun.failnodes=true. Oozie tries to join it and fails with error
> message.
> {quote}
> We should investigate why this is happening and fix it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)