[ 
https://issues.apache.org/jira/browse/TEZ-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated TEZ-4561:
------------------------------
    Description: 
https://github.com/apache/tez/blob/66a6ca64b5edde0d30bea0962cb132f3c4982469/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java#L1683

the AM can return this exception during a shutdown like below:
{code}
TezUncheckedException: Cannot get ApplicationACLs before all services have 
started
   at 
org.apache.tez.dag.app.DAGAppMaster$RunningAppContext.getApplicationACLs(DAGAppMaster.java:1733)
   at 
org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:513)
   at 
org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:470)
   at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
   at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
   at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:493)
   at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:64)
   at 
org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:441)
   at 
org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:78)
   at 
org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:68)
   at 
org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:40)
   at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:200)
   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:118)
   at java.base/java.lang.Thread.run(Thread.java:829)\r
{code}
which is confusing, and doesn't make the log reader aware that 
getServiceState() != STATE.STARTED is not an initialization problem (especially 
confusing in case of an AM which is already running for a long time), instead 
STATE.STOPPED

we should check that and report (maybe even with a timestamp when the 
shutdownhook was started)

> Improve reported exception when DAGAppMaster is shutting down
> -------------------------------------------------------------
>
>                 Key: TEZ-4561
>                 URL: https://issues.apache.org/jira/browse/TEZ-4561
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Priority: Major
>
> https://github.com/apache/tez/blob/66a6ca64b5edde0d30bea0962cb132f3c4982469/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java#L1683
> the AM can return this exception during a shutdown like below:
> {code}
> TezUncheckedException: Cannot get ApplicationACLs before all services have 
> started
>    at 
> org.apache.tez.dag.app.DAGAppMaster$RunningAppContext.getApplicationACLs(DAGAppMaster.java:1733)
>    at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:513)
>    at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:470)
>    at 
> org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362)
>    at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>    at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
>    at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:493)
>    at 
> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:64)
>    at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:441)
>    at 
> org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:78)
>    at 
> org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:68)
>    at 
> org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:40)
>    at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:200)
>    at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:118)
>    at java.base/java.lang.Thread.run(Thread.java:829)\r
> {code}
> which is confusing, and doesn't make the log reader aware that 
> getServiceState() != STATE.STARTED is not an initialization problem 
> (especially confusing in case of an AM which is already running for a long 
> time), instead STATE.STOPPED
> we should check that and report (maybe even with a timestamp when the 
> shutdownhook was started)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to