[ 
https://issues.apache.org/jira/browse/FLINK-13895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918491#comment-16918491
 ] 

Yu Wang commented on FLINK-13895:
---------------------------------

[~Tison] Please review it , Thanks

> Client does not exit when bin/yarn-session.sh come fail
> -------------------------------------------------------
>
>                 Key: FLINK-13895
>                 URL: https://issues.apache.org/jira/browse/FLINK-13895
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / YARN
>    Affects Versions: 1.9.0
>            Reporter: Yu Wang
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> 2019-08-29 09:42:00,589 INFO  
> org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Deploying 
> cluster, current state ACCEPTED
> 2019-08-29 09:42:04,718 ERROR org.apache.flink.yarn.cli.FlinkYarnSessionCli   
>               - Error while running the Flink Yarn session.
> org.apache.flink.client.deployment.ClusterDeploymentException: Couldn't 
> deploy Yarn session cluster
>       at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:385)
>       at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:616)
>       at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.lambda$main$3(FlinkYarnSessionCli.java:844)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
>       at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>       at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:844)
> Caused by: 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor$YarnDeploymentException: 
> The YARN application unexpectedly switched to state FAILED during deployment. 
> Diagnostics from YARN: Application application_1565802461003_0608 failed 1 
> times due to AM Container for appattempt_1565802461003_0608_000001 exited 
> with  exitCode: 1
> For more detailed output, check application tracking 
> page:https://hadoop-btnn9001.eniot.io:8090/cluster/app/application_1565802461003_0608Then,
>  click on links to logs of each attempt.
> Diagnostics: Exception from container-launch.
> Container id: container_e35_1565802461003_0608_01_000001
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1: 
>       at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
>       at org.apache.hadoop.util.Shell.run(Shell.java:456)
>       at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:387)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> Shell output: main : command provided 1
> main : run as user is flinktest
> main : requested yarn user is flinktest
> Container exited with a non-zero exit code 1
> Failing this attempt. Failing the application.
> If log aggregation is enabled on your cluster, use this command to further 
> investigate the issue:
> yarn logs -applicationId application_1565802461003_0608
>       at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.startAppMaster(AbstractYarnClusterDescriptor.java:1024)
>       at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:507)
>       at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:378)
>       ... 7 more
> 2019-08-29 09:42:04,723 INFO  
> org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Cancelling 
> deployment from Deployment Failure Hook
> 2019-08-29 09:42:04,723 INFO  
> org.apache.flink.yarn.AbstractYarnClusterDescriptor           - Killing YARN 
> application
> 2019-08-29 09:42:04,729 INFO  
> org.apache.hadoop.io.retry.RetryInvocationHandler             - Exception 
> while invoking forceKillApplication of class 
> ApplicationClientProtocolPBClientImpl over rm1. Trying to fail over 
> immediately.
> java.io.IOException: The client is stopped
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:1508)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1452)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1413)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>       at com.sun.proxy.$Proxy7.forceKillApplication(Unknown Source)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.forceKillApplication(ApplicationClientProtocolPBClientImpl.java:176)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy8.forceKillApplication(Unknown Source)
>       at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.killApplication(YarnClientImpl.java:394)
>       at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.failSessionDuringDeployment(AbstractYarnClusterDescriptor.java:1201)
>       at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.access$200(AbstractYarnClusterDescriptor.java:113)
>       at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor$DeploymentFailureHook.run(AbstractYarnClusterDescriptor.java:1501)
> 2019-08-29 09:42:04,730 INFO  
> org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider  - Failing 
> over to rm2
> 2019-08-29 09:42:04,735 INFO  
> org.apache.hadoop.io.retry.RetryInvocationHandler             - Exception 
> while invoking forceKillApplication of class 
> ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. 
> Trying to fail over after sleeping for 1767ms.
> java.net.ConnectException: Call From streamsets9001/10.27.20.184 to 
> hadoop-btnn9002.eniot.io:8032 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
>       at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1480)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1413)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>       at com.sun.proxy.$Proxy7.forceKillApplication(Unknown Source)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.forceKillApplication(ApplicationClientProtocolPBClientImpl.java:176)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:497)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy8.forceKillApplication(Unknown Source)
>       at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.killApplication(YarnClientImpl.java:394)
>       at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.failSessionDuringDeployment(AbstractYarnClusterDescriptor.java:1201)
>       at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.access$200(AbstractYarnClusterDescriptor.java:113)
>       at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor$DeploymentFailureHook.run(AbstractYarnClusterDescriptor.java:1501)
> Caused by: java.net.ConnectException: Connection refused



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to