[ https://issues.apache.org/jira/browse/TEZ-4561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
László Bodor updated TEZ-4561: ------------------------------ Description: https://github.com/apache/tez/blob/66a6ca64b5edde0d30bea0962cb132f3c4982469/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java#L1683 the AM can return this exception during a shutdown like below: {code} TezUncheckedException: Cannot get ApplicationACLs before all services have started at org.apache.tez.dag.app.DAGAppMaster$RunningAppContext.getApplicationACLs(DAGAppMaster.java:1733) at org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:513) at org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:470) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:493) at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:64) at org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:441) at org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:78) at org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:68) at org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:40) at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:200) at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:118) at java.base/java.lang.Thread.run(Thread.java:829)\r {code} which is confusing, and doesn't make the log reader aware that getServiceState() != STATE.STARTED is not an initialization problem (especially confusing in case of an AM which is already running for a long time), instead STATE.STOPPED we should check that and report (maybe even with a timestamp when the shutdownhook was started) > Improve reported exception when DAGAppMaster is shutting down > ------------------------------------------------------------- > > Key: TEZ-4561 > URL: https://issues.apache.org/jira/browse/TEZ-4561 > Project: Apache Tez > Issue Type: Improvement > Reporter: László Bodor > Priority: Major > > https://github.com/apache/tez/blob/66a6ca64b5edde0d30bea0962cb132f3c4982469/tez-dag/src/main/java/org/apache/tez/dag/app/DAGAppMaster.java#L1683 > the AM can return this exception during a shutdown like below: > {code} > TezUncheckedException: Cannot get ApplicationACLs before all services have > started > at > org.apache.tez.dag.app.DAGAppMaster$RunningAppContext.getApplicationACLs(DAGAppMaster.java:1733) > at > org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:513) > at > org.apache.tez.dag.app.rm.container.AMContainerImpl$LaunchRequestTransition.transition(AMContainerImpl.java:470) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:493) > at > org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:64) > at > org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:441) > at > org.apache.tez.dag.app.rm.container.AMContainerImpl.handle(AMContainerImpl.java:78) > at > org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:68) > at > org.apache.tez.dag.app.rm.container.AMContainerMap.handle(AMContainerMap.java:40) > at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:200) > at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:118) > at java.base/java.lang.Thread.run(Thread.java:829)\r > {code} > which is confusing, and doesn't make the log reader aware that > getServiceState() != STATE.STARTED is not an initialization problem > (especially confusing in case of an AM which is already running for a long > time), instead STATE.STOPPED > we should check that and report (maybe even with a timestamp when the > shutdownhook was started) -- This message was sent by Atlassian Jira (v8.20.10#820010)