[jira] [Commented] (FLINK-9891) Flink cluster is not shutdown in YARN mode when Flink client is stopped
[ https://issues.apache.org/jira/browse/FLINK-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563275#comment-16563275 ] Sergey Krasovskiy commented on FLINK-9891: -- [~till.rohrmann] thank you for your response. Are you aware if there is any progress on this task? And may we expect the fix in version of Flink 1.5.x? > Flink cluster is not shutdown in YARN mode when Flink client is stopped > --- > > Key: FLINK-9891 > URL: https://issues.apache.org/jira/browse/FLINK-9891 > Project: Flink > Issue Type: Bug >Affects Versions: 1.5.0, 1.5.1 >Reporter: Sergey Krasovskiy >Assignee: Shuyi Chen >Priority: Major > > We are not using session mode and detached mode. The command to run Flink job > on YARN is: > {code:java} > /bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm > 2048 -j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount > {code} > Flink CLI logs: > {code:java} > Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set. > SLF4J: Class path contains multiple SLF4J bindings. > SLF4J: Found binding in > [jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: Found binding in > [jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an > explanation. > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] > 2018-07-18 12:47:03,747 INFO > org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service > address: http://hmaster-1.ipbl.rgcloud.net:8188/ws/v1/timeline/ > 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - > No path for the flink jar passed. Using the location of class > org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - > No path for the flink jar passed. Using the location of class > org.apache.flink.yarn.YarnClusterDescriptor to locate the jar > 2018-07-18 12:47:04,248 WARN > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither the > HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink > YARN Client needs one of these to be set to properly load the Hadoop > configuration for accessing YARN. > 2018-07-18 12:47:04,409 INFO > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: > ClusterSpecification{masterMemoryMB=768, taskManagerMemoryMB=2048, > numberTaskManagers=1, slotsPerTaskManager=1} > 2018-07-18 12:47:04,783 WARN > org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory - The short-circuit > local reads feature cannot be used because libhadoop cannot be loaded. > 2018-07-18 12:47:04,788 WARN > org.apache.flink.yarn.AbstractYarnClusterDescriptor - The configuration > directory > ('/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/conf') > contains both LOG4J and Logback configuration files. Please delete or rename > one of them. > 2018-07-18 12:47:07,846 INFO > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting application > master application_1531474158783_10814 > 2018-07-18 12:47:08,073 INFO > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application > application_1531474158783_10814 > 2018-07-18 12:47:08,074 INFO > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster > to be allocated > 2018-07-18 12:47:08,076 INFO > org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, > current state ACCEPTED > 2018-07-18 12:47:12,864 INFO > org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has > been deployed successfully. > {code} > Job Manager logs: > {code:java} > 2018-07-18 12:47:09,913 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - > > 2018-07-18 12:47:09,915 INFO > org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting > YarnSessionClusterEntrypoint (Version: 1.5.1, Rev:3488f8b, Date:10.07.2018 @ > 11:51:27 GMT) > ... > {code} > Issues: > # Flink job is running as a Flink session > # Ctrl+C or 'stop' doesn't stop a job and YARN cluster > # Cancel job via Job Maanager web ui doesn't stop Flink cluster. To kill the > cluster we need to run: yarn application -kill > We also tried to run a flink job with 'mode: legacy' and we have the same > issues: > # Add property 'mode: legacy' to ./conf/flink-conf.yaml > # Execute the following command: > {code:java} > /bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm
[jira] [Created] (FLINK-9891) Flink cluster is not shutdown in YARN mode when Flink client is stopped
Sergey Krasovskiy created FLINK-9891: Summary: Flink cluster is not shutdown in YARN mode when Flink client is stopped Key: FLINK-9891 URL: https://issues.apache.org/jira/browse/FLINK-9891 Project: Flink Issue Type: Bug Affects Versions: 1.5.1, 1.5.0 Reporter: Sergey Krasovskiy We are not using session mode and detached mode. The command to run flink job on YARN is: {code:java} /bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 2048 -j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount {code} Flink CLI logs: {code:java} Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2018-07-18 12:47:03,747 INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hmaster-1.ipbl.rgcloud.net:8188/ws/v1/timeline/ 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2018-07-18 12:47:04,248 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN. 2018-07-18 12:47:04,409 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=768, taskManagerMemoryMB=2048, numberTaskManagers=1, slotsPerTaskManager=1} 2018-07-18 12:47:04,783 WARN org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 2018-07-18 12:47:04,788 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - The configuration directory ('/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them. 2018-07-18 12:47:07,846 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting application master application_1531474158783_10814 2018-07-18 12:47:08,073 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1531474158783_10814 2018-07-18 12:47:08,074 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster to be allocated 2018-07-18 12:47:08,076 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, current state ACCEPTED 2018-07-18 12:47:12,864 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has been deployed successfully. {code} Job Manager logs: {code:java} 2018-07-18 12:47:09,913 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 2018-07-18 12:47:09,915 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint (Version: 1.5.1, Rev:3488f8b, Date:10.07.2018 @ 11:51:27 GMT) ... {code} Issues: # Flink job is running as a Flink session # Ctrl+C or 'stop' doesn't stop a job and YARN cluster # Cancel job via Job Maanager web ui doesn't stop Flink cluster. To kill the cluster we need to run: yarn application -kill We also tried to run a flink job with 'mode: legacy' and we have the same issues: # Add property 'mode: legacy' to ./conf/flink-conf.yaml # Execute the following command: {code:java} /bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 2048 -j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount {code} Flink CLI logs: {code:java} Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See
[jira] [Updated] (FLINK-9891) Flink cluster is not shutdown in YARN mode when Flink client is stopped
[ https://issues.apache.org/jira/browse/FLINK-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Krasovskiy updated FLINK-9891: - Description: We are not using session mode and detached mode. The command to run Flink job on YARN is: {code:java} /bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 2048 -j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount {code} Flink CLI logs: {code:java} Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2018-07-18 12:47:03,747 INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://hmaster-1.ipbl.rgcloud.net:8188/ws/v1/timeline/ 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar 2018-07-18 12:47:04,248 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN. 2018-07-18 12:47:04,409 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=768, taskManagerMemoryMB=2048, numberTaskManagers=1, slotsPerTaskManager=1} 2018-07-18 12:47:04,783 WARN org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 2018-07-18 12:47:04,788 WARN org.apache.flink.yarn.AbstractYarnClusterDescriptor - The configuration directory ('/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/conf') contains both LOG4J and Logback configuration files. Please delete or rename one of them. 2018-07-18 12:47:07,846 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting application master application_1531474158783_10814 2018-07-18 12:47:08,073 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1531474158783_10814 2018-07-18 12:47:08,074 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster to be allocated 2018-07-18 12:47:08,076 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, current state ACCEPTED 2018-07-18 12:47:12,864 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has been deployed successfully. {code} Job Manager logs: {code:java} 2018-07-18 12:47:09,913 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 2018-07-18 12:47:09,915 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting YarnSessionClusterEntrypoint (Version: 1.5.1, Rev:3488f8b, Date:10.07.2018 @ 11:51:27 GMT) ... {code} Issues: # Flink job is running as a Flink session # Ctrl+C or 'stop' doesn't stop a job and YARN cluster # Cancel job via Job Maanager web ui doesn't stop Flink cluster. To kill the cluster we need to run: yarn application -kill We also tried to run a flink job with 'mode: legacy' and we have the same issues: # Add property 'mode: legacy' to ./conf/flink-conf.yaml # Execute the following command: {code:java} /bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 2048 -j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount {code} Flink CLI logs: {code:java} Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2018-07-18 16:07:13,820 INFO