[ 
https://issues.apache.org/jira/browse/TEZ-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152996#comment-14152996
 ] 

Jeff Zhang edited comment on TEZ-1631 at 9/30/14 9:16 AM:
----------------------------------------------------------

Simulate the case in single node cluster ( using UnionExample which has 
vertexGroup so that it would modify it when converting to DAGPlan), it will 
print the following message if submit the same DAG after AM is session timeout

{code}
org.apache.tez.dag.api.SessionNotRunning: Application not running, 
applicationId=application_1412047114698_0012, yarnApplicationState=FINISHED, 
finalApplicationStatus=SUCCEEDED, 
trackingUrl=http://jzhangMBPr.local:8088/proxy/application_1412047114698_0012/A
        at 
org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:733)
        at org.apache.tez.client.TezSession.stop(TezSession.java:270)
        at 
org.apache.tez.mapreduce.examples.UnionExample.run(UnionExample.java:503)
        at 
org.apache.tez.mapreduce.examples.UnionExample.main(UnionExample.java:513)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
        at 
org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/09/30 17:10:19 INFO client.TezSession: Could not connect to AM, killing 
session via YARN, sessionName=UnionExampleSession, 
applicationId=application_1412047114698_0012
14/09/30 17:10:19 INFO impl.YarnClientImpl: Killed application 
application_1412047114698_0012
org.apache.tez.dag.api.SessionNotRunning: Application not running, 
applicationId=application_1412047114698_0012, yarnApplicationState=FINISHED, 
finalApplicationStatus=SUCCEEDED, 
trackingUrl=http://jzhangMBPr.local:8088/proxy/application_1412047114698_0012/A
        at 
org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:733)
        at org.apache.tez.client.TezSession.waitForProxy(TezSession.java:400)
        at org.apache.tez.client.TezSession.submitDAG(TezSession.java:197)
        at org.apache.tez.client.TezSession.submitDAG(TezSession.java:162)
        at 
org.apache.tez.mapreduce.examples.UnionExample.run(UnionExample.java:499)
        at 
org.apache.tez.mapreduce.examples.UnionExample.main(UnionExample.java:513)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
        at 
org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
{code}

Without the patch, the error message would be 
{code}
14/09/30 17:07:16 INFO client.TezSession: Shutting down Tez Session, 
sessionName=UnionExampleSession, applicationId=application_1412047114698_0011
14/09/30 17:07:16 INFO client.TezSession: Failed to shutdown Tez Session via 
proxy
org.apache.tez.dag.api.SessionNotRunning: Application not running, 
applicationId=application_1412047114698_0011, yarnApplicationState=FINISHED, 
finalApplicationStatus=SUCCEEDED, 
trackingUrl=http://jzhangMBPr.local:8088/proxy/application_1412047114698_0011/A
        at 
org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:733)
        at org.apache.tez.client.TezSession.stop(TezSession.java:270)
        at 
org.apache.tez.mapreduce.examples.UnionExample.run(UnionExample.java:503)
        at 
org.apache.tez.mapreduce.examples.UnionExample.main(UnionExample.java:513)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
        at 
org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/09/30 17:07:16 INFO client.TezSession: Could not connect to AM, killing 
session via YARN, sessionName=UnionExampleSession, 
applicationId=application_1412047114698_0011
14/09/30 17:07:16 INFO impl.YarnClientImpl: Killed application 
application_1412047114698_0011
java.lang.IllegalStateException: Vertex: checker already has group input with 
name:union
        at org.apache.tez.dag.api.Vertex.addGroupInput(Vertex.java:248)
        at org.apache.tez.dag.api.DAG.processEdgesAndGroups(DAG.java:223)
        at org.apache.tez.dag.api.DAG.verify(DAG.java:284)
        at org.apache.tez.dag.api.DAG.createDag(DAG.java:462)
        at org.apache.tez.client.TezSession.submitDAG(TezSession.java:222)
        at org.apache.tez.client.TezSession.submitDAG(TezSession.java:162)
        at 
org.apache.tez.mapreduce.examples.UnionExample.run(UnionExample.java:499)
        at 
org.apache.tez.mapreduce.examples.UnionExample.main(UnionExample.java:513)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
        at 
org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
{code}


was (Author: zjffdu):
Simulate the case in single node cluster ( using UnionExample which has 
vertexGroup so that it would modify it when converting to DAGPlan), it will 
print the following message if submit the same DAG after AM is session timeout

{code}
org.apache.tez.dag.api.SessionNotRunning: Application not running, 
applicationId=application_1412047114698_0012, yarnApplicationState=FINISHED, 
finalApplicationStatus=SUCCEEDED, 
trackingUrl=http://jzhangMBPr.local:8088/proxy/application_1412047114698_0012/A
        at 
org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:733)
        at org.apache.tez.client.TezSession.stop(TezSession.java:270)
        at 
org.apache.tez.mapreduce.examples.UnionExample.run(UnionExample.java:503)
        at 
org.apache.tez.mapreduce.examples.UnionExample.main(UnionExample.java:513)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
        at 
org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/09/30 17:10:19 INFO client.TezSession: Could not connect to AM, killing 
session via YARN, sessionName=UnionExampleSession, 
applicationId=application_1412047114698_0012
14/09/30 17:10:19 INFO impl.YarnClientImpl: Killed application 
application_1412047114698_0012
org.apache.tez.dag.api.SessionNotRunning: Application not running, 
applicationId=application_1412047114698_0012, yarnApplicationState=FINISHED, 
finalApplicationStatus=SUCCEEDED, 
trackingUrl=http://jzhangMBPr.local:8088/proxy/application_1412047114698_0012/A
        at 
org.apache.tez.client.TezClientUtils.getSessionAMProxy(TezClientUtils.java:733)
        at org.apache.tez.client.TezSession.waitForProxy(TezSession.java:400)
        at org.apache.tez.client.TezSession.submitDAG(TezSession.java:197)
        at org.apache.tez.client.TezSession.submitDAG(TezSession.java:162)
        at 
org.apache.tez.mapreduce.examples.UnionExample.run(UnionExample.java:499)
        at 
org.apache.tez.mapreduce.examples.UnionExample.main(UnionExample.java:513)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
        at 
org.apache.tez.mapreduce.examples.ExampleDriver.main(ExampleDriver.java:88)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
{code}

> Session dag submission timeout can result in duplicate DAG submissions
> ----------------------------------------------------------------------
>
>                 Key: TEZ-1631
>                 URL: https://issues.apache.org/jira/browse/TEZ-1631
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.4.1
>            Reporter: Bikas Saha
>            Assignee: Jeff Zhang
>         Attachments: Tez-1631.patch
>
>
> In TezSession.submitDAG() we could first check if the session is ready and 
> throw a SessionNotRunning exception if that is not the case. This should be 
> done before processing the DAG and thus will prevent unnecessary modification 
> of the DAG.
> If the session is ready then we can submit the DAG as usual. Higher level 
> components already handle SessionNotRunning exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to