[ 
https://issues.apache.org/jira/browse/TEZ-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558699#comment-14558699
 ] 

Jeff Zhang commented on TEZ-2307:
---------------------------------

[~mitdesai] It is not easy to reproduce it. It happens randomly. I check the 
code that it is possible to happen. The reason is that DAG's state machine may 
already transition to FINISHED state, that means client also know that DAG is 
completed and may submit a new dag. But at this time, DAGAppMaster may haven't 
know the DAG is completed. so will cause the above error. I link this ticket 
with TEZ-1273, suppose after TEZ-1273, it can be resolved. 

> DAGAppMaster may still be in RUNNING when DAG is finished
> ---------------------------------------------------------
>
>                 Key: TEZ-2307
>                 URL: https://issues.apache.org/jira/browse/TEZ-2307
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>
> It is possible that DAG is finished while DAGAppMaster is still in RUNNING, 
> in this case the next submission of dag will cause the following error:
> {code}
> 2015-04-10 06:01:50,369 INFO  [IPC Server handler 0 on 46821] ipc.Server 
> (Server.java:run(2070)) - IPC Server handler 0 on 46821, call 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPB.submitDAG 
> from 10.0.0.223:48581 Call#411 Retry#0
> org.apache.tez.dag.api.TezException: App master already running a DAG
>       at 
> org.apache.tez.dag.app.DAGAppMaster.submitDAGToAppMaster(DAGAppMaster.java:1131)
>       at 
> org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:118)
>       at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:163)
>       at 
> org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7471)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>       at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to