[ 
https://issues.apache.org/jira/browse/OOZIE-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514069#comment-14514069
 ] 

Shwetha G S commented on OOZIE-1993:
------------------------------------

[~puru], 
I don't think this issue is because of OOZIE-1879 patch. Its because of the way 
executionPaths is implemented in LiteWorkflowInstance - both hdfs_1 and 
hadoop_streaming_1 end up with same execution path /hdfs_1/ which causes issue 
during re-run.
{noformat}
2015-04-27 18:04:09,852 DEBUG LiteWorkflowInstance[pool-5-thread-8] - 
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] 
JOB[0000000-150427180330562-oozie-sshi-W] 
ACTION[0000000-150427180330562-oozie-sshi-W@fork1] Signaling job node [fork1] 
execution path [/] signal value [OK]
2015-04-27 18:04:09,857 DEBUG LiteWorkflowInstance[pool-5-thread-8] - 
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] 
JOB[0000000-150427180330562-oozie-sshi-W] 
ACTION[0000000-150427180330562-oozie-sshi-W@fork1] Signaling job node [java1] 
execution path [/java1/] signal value [::synch::]
2015-04-27 18:04:09,861 DEBUG LiteWorkflowInstance[pool-5-thread-8] - 
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] 
JOB[0000000-150427180330562-oozie-sshi-W] 
ACTION[0000000-150427180330562-oozie-sshi-W@fork1] Signaling job node [hdfs1] 
execution path [/hdfs1/] signal value [::synch::]
2015-04-27 18:04:36,323 DEBUG LiteWorkflowInstance[pool-5-thread-8] - 
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] 
JOB[0000000-150427180330562-oozie-sshi-W] 
ACTION[0000000-150427180330562-oozie-sshi-W@hdfs1] Signaling job node [hdfs1] 
execution path [/hdfs1/] signal value [OK]
2015-04-27 18:07:48,514 DEBUG LiteWorkflowInstance[pool-5-thread-8] - 
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] 
JOB[0000000-150427180330562-oozie-sshi-W] 
ACTION[0000000-150427180330562-oozie-sshi-W@hdfs1] Signaling job node [stream1] 
execution path [/hdfs1/] signal value [::synch::]
2015-04-27 18:07:49,164 DEBUG LiteWorkflowInstance[pool-5-thread-5] - 
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] 
JOB[0000000-150427180330562-oozie-sshi-W] 
ACTION[0000000-150427180330562-oozie-sshi-W@java1] Signaling job node [java1] 
execution path [/java1/] signal value [OK]
2015-04-27 18:07:49,164 DEBUG LiteWorkflowInstance[pool-5-thread-5] - 
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] 
JOB[0000000-150427180330562-oozie-sshi-W] 
ACTION[0000000-150427180330562-oozie-sshi-W@java1] Signaling job node [join1] 
execution path [/java1/] signal value [::synch::]
2015-04-27 18:08:06,926 DEBUG LiteWorkflowInstance[pool-5-thread-9] - 
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] 
JOB[0000000-150427180330562-oozie-sshi-W] 
ACTION[0000000-150427180330562-oozie-sshi-W@stream1] Signaling job node 
[stream1] execution path [/hdfs1/] signal value [OK]
2015-04-27 18:08:06,926 DEBUG LiteWorkflowInstance[pool-5-thread-9] - 
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] 
JOB[0000000-150427180330562-oozie-sshi-W] 
ACTION[0000000-150427180330562-oozie-sshi-W@stream1] Signaling job node [join1] 
execution path [/hdfs1/] signal value [::synch::]
2015-04-27 18:08:06,970 DEBUG LiteWorkflowInstance[pool-5-thread-9] - 
SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] 
JOB[0000000-150427180330562-oozie-sshi-W] 
ACTION[0000000-150427180330562-oozie-sshi-W@join1] Signaling job node [join1] 
execution path [/hdfs1/] signal value [OK]
{noformat}

Does this make sense? I am not familiar with workflow execution code. So, just 
want to make sure I am not missing anything.

> Rerun fails during join in certain condition
> --------------------------------------------
>
>                 Key: OOZIE-1993
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1993
>             Project: Oozie
>          Issue Type: Bug
>          Components: core
>    Affects Versions: trunk, 4.1.0
>            Reporter: Robert Kanter
>             Fix For: trunk
>
>         Attachments: oozie-log.txt, worflow.xml
>
>
> As [~puru] described in [this 
> comment|https://issues.apache.org/jira/browse/OOZIE-1989?focusedCommentId=14123509&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14123509]
>  in OOZIE-1989:
> {quote}
> At first run job first fails at pig action.  When we rerun from failed node, 
> oozie.wf.rerun.failnodes=true. Oozie tries to join it and fails with error 
> message.
> {quote}
> We should investigate why this is happening and fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to