[ https://issues.apache.org/jira/browse/OOZIE-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514069#comment-14514069 ]
Shwetha G S commented on OOZIE-1993: ------------------------------------ [~puru], I don't think this issue is because of OOZIE-1879 patch. Its because of the way executionPaths is implemented in LiteWorkflowInstance - both hdfs_1 and hadoop_streaming_1 end up with same execution path /hdfs_1/ which causes issue during re-run. {noformat} 2015-04-27 18:04:09,852 DEBUG LiteWorkflowInstance[pool-5-thread-8] - SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] JOB[0000000-150427180330562-oozie-sshi-W] ACTION[0000000-150427180330562-oozie-sshi-W@fork1] Signaling job node [fork1] execution path [/] signal value [OK] 2015-04-27 18:04:09,857 DEBUG LiteWorkflowInstance[pool-5-thread-8] - SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] JOB[0000000-150427180330562-oozie-sshi-W] ACTION[0000000-150427180330562-oozie-sshi-W@fork1] Signaling job node [java1] execution path [/java1/] signal value [::synch::] 2015-04-27 18:04:09,861 DEBUG LiteWorkflowInstance[pool-5-thread-8] - SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] JOB[0000000-150427180330562-oozie-sshi-W] ACTION[0000000-150427180330562-oozie-sshi-W@fork1] Signaling job node [hdfs1] execution path [/hdfs1/] signal value [::synch::] 2015-04-27 18:04:36,323 DEBUG LiteWorkflowInstance[pool-5-thread-8] - SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] JOB[0000000-150427180330562-oozie-sshi-W] ACTION[0000000-150427180330562-oozie-sshi-W@hdfs1] Signaling job node [hdfs1] execution path [/hdfs1/] signal value [OK] 2015-04-27 18:07:48,514 DEBUG LiteWorkflowInstance[pool-5-thread-8] - SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] JOB[0000000-150427180330562-oozie-sshi-W] ACTION[0000000-150427180330562-oozie-sshi-W@hdfs1] Signaling job node [stream1] execution path [/hdfs1/] signal value [::synch::] 2015-04-27 18:07:49,164 DEBUG LiteWorkflowInstance[pool-5-thread-5] - SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] JOB[0000000-150427180330562-oozie-sshi-W] ACTION[0000000-150427180330562-oozie-sshi-W@java1] Signaling job node [java1] execution path [/java1/] signal value [OK] 2015-04-27 18:07:49,164 DEBUG LiteWorkflowInstance[pool-5-thread-5] - SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] JOB[0000000-150427180330562-oozie-sshi-W] ACTION[0000000-150427180330562-oozie-sshi-W@java1] Signaling job node [join1] execution path [/java1/] signal value [::synch::] 2015-04-27 18:08:06,926 DEBUG LiteWorkflowInstance[pool-5-thread-9] - SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] JOB[0000000-150427180330562-oozie-sshi-W] ACTION[0000000-150427180330562-oozie-sshi-W@stream1] Signaling job node [stream1] execution path [/hdfs1/] signal value [OK] 2015-04-27 18:08:06,926 DEBUG LiteWorkflowInstance[pool-5-thread-9] - SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] JOB[0000000-150427180330562-oozie-sshi-W] ACTION[0000000-150427180330562-oozie-sshi-W@stream1] Signaling job node [join1] execution path [/hdfs1/] signal value [::synch::] 2015-04-27 18:08:06,970 DEBUG LiteWorkflowInstance[pool-5-thread-9] - SERVER[shwethags.mac] USER[sshivalingamurthy] GROUP[-] TOKEN[] APP[demo-wf] JOB[0000000-150427180330562-oozie-sshi-W] ACTION[0000000-150427180330562-oozie-sshi-W@join1] Signaling job node [join1] execution path [/hdfs1/] signal value [OK] {noformat} Does this make sense? I am not familiar with workflow execution code. So, just want to make sure I am not missing anything. > Rerun fails during join in certain condition > -------------------------------------------- > > Key: OOZIE-1993 > URL: https://issues.apache.org/jira/browse/OOZIE-1993 > Project: Oozie > Issue Type: Bug > Components: core > Affects Versions: trunk, 4.1.0 > Reporter: Robert Kanter > Fix For: trunk > > Attachments: oozie-log.txt, worflow.xml > > > As [~puru] described in [this > comment|https://issues.apache.org/jira/browse/OOZIE-1989?focusedCommentId=14123509&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14123509] > in OOZIE-1989: > {quote} > At first run job first fails at pig action. When we rerun from failed node, > oozie.wf.rerun.failnodes=true. Oozie tries to join it and fails with error > message. > {quote} > We should investigate why this is happening and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)