[jira] [Commented] (FLINK-9891) Flink cluster is not shutdown in YARN mode when Flink client is stopped

2018-07-31 Thread Sergey Krasovskiy (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16563275#comment-16563275
 ] 

Sergey Krasovskiy commented on FLINK-9891:
--

[~till.rohrmann] thank you for your response.

Are you aware if there is any progress on this task? And may we expect the fix 
in version of Flink 1.5.x?

> Flink cluster is not shutdown in YARN mode when Flink client is stopped
> ---
>
> Key: FLINK-9891
> URL: https://issues.apache.org/jira/browse/FLINK-9891
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.5.0, 1.5.1
>Reporter: Sergey Krasovskiy
>Assignee: Shuyi Chen
>Priority: Major
>
> We are not using session mode and detached mode. The command to run Flink job 
> on YARN is:
> {code:java}
> /bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 
> 2048 -j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount
> {code}
> Flink CLI logs:
> {code:java}
> Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2018-07-18 12:47:03,747 INFO 
> org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service 
> address: http://hmaster-1.ipbl.rgcloud.net:8188/ws/v1/timeline/
> 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - 
> No path for the flink jar passed. Using the location of class 
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - 
> No path for the flink jar passed. Using the location of class 
> org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
> 2018-07-18 12:47:04,248 WARN 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither the 
> HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink 
> YARN Client needs one of these to be set to properly load the Hadoop 
> configuration for accessing YARN.
> 2018-07-18 12:47:04,409 INFO 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: 
> ClusterSpecification{masterMemoryMB=768, taskManagerMemoryMB=2048, 
> numberTaskManagers=1, slotsPerTaskManager=1}
> 2018-07-18 12:47:04,783 WARN 
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory - The short-circuit 
> local reads feature cannot be used because libhadoop cannot be loaded.
> 2018-07-18 12:47:04,788 WARN 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - The configuration 
> directory 
> ('/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/conf')
>  contains both LOG4J and Logback configuration files. Please delete or rename 
> one of them.
> 2018-07-18 12:47:07,846 INFO 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting application 
> master application_1531474158783_10814
> 2018-07-18 12:47:08,073 INFO 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application 
> application_1531474158783_10814
> 2018-07-18 12:47:08,074 INFO 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster 
> to be allocated
> 2018-07-18 12:47:08,076 INFO 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, 
> current state ACCEPTED
> 2018-07-18 12:47:12,864 INFO 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has 
> been deployed successfully.
> {code}
> Job Manager logs:
> {code:java}
> 2018-07-18 12:47:09,913 INFO 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 
> 
> 2018-07-18 12:47:09,915 INFO 
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting 
> YarnSessionClusterEntrypoint (Version: 1.5.1, Rev:3488f8b, Date:10.07.2018 @ 
> 11:51:27 GMT)
> ...
> {code}
> Issues:
>  # Flink job is running as a Flink session
>  # Ctrl+C or 'stop' doesn't stop a job and YARN cluster
>  # Cancel job via Job Maanager web ui doesn't stop Flink cluster. To kill the 
> cluster we need to run: yarn application -kill 
> We also tried to run a flink job with 'mode: legacy' and we have the same 
> issues:
>  # Add property 'mode: legacy' to ./conf/flink-conf.yaml
>  # Execute the following command:
> {code:java}
> /bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 

[jira] [Created] (FLINK-9891) Flink cluster is not shutdown in YARN mode when Flink client is stopped

2018-07-18 Thread Sergey Krasovskiy (JIRA)
Sergey Krasovskiy created FLINK-9891:


 Summary: Flink cluster is not shutdown in YARN mode when Flink 
client is stopped
 Key: FLINK-9891
 URL: https://issues.apache.org/jira/browse/FLINK-9891
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.5.1, 1.5.0
Reporter: Sergey Krasovskiy


We are not using session mode and detached mode. The command to run flink job 
on YARN is:
{code:java}
/bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 2048 
-j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount
{code}
Flink CLI logs:
{code:java}
Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-07-18 12:47:03,747 INFO 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service 
address: http://hmaster-1.ipbl.rgcloud.net:8188/ws/v1/timeline/
2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No 
path for the flink jar passed. Using the location of class 
org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No 
path for the flink jar passed. Using the location of class 
org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-07-18 12:47:04,248 WARN 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither the 
HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink 
YARN Client needs one of these to be set to properly load the Hadoop 
configuration for accessing YARN.
2018-07-18 12:47:04,409 INFO 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: 
ClusterSpecification{masterMemoryMB=768, taskManagerMemoryMB=2048, 
numberTaskManagers=1, slotsPerTaskManager=1}
2018-07-18 12:47:04,783 WARN 
org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory - The short-circuit 
local reads feature cannot be used because libhadoop cannot be loaded.
2018-07-18 12:47:04,788 WARN 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - The configuration 
directory 
('/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/conf')
 contains both LOG4J and Logback configuration files. Please delete or rename 
one of them.
2018-07-18 12:47:07,846 INFO 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting application 
master application_1531474158783_10814
2018-07-18 12:47:08,073 INFO 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application 
application_1531474158783_10814
2018-07-18 12:47:08,074 INFO 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster 
to be allocated
2018-07-18 12:47:08,076 INFO 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, 
current state ACCEPTED
2018-07-18 12:47:12,864 INFO 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has been 
deployed successfully.
{code}
Job Manager logs:
{code:java}
2018-07-18 12:47:09,913 INFO 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 

2018-07-18 12:47:09,915 INFO 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting 
YarnSessionClusterEntrypoint (Version: 1.5.1, Rev:3488f8b, Date:10.07.2018 @ 
11:51:27 GMT)
...
{code}
Issues:
 # Flink job is running as a Flink session
 # Ctrl+C or 'stop' doesn't stop a job and YARN cluster
 # Cancel job via Job Maanager web ui doesn't stop Flink cluster. To kill the 
cluster we need to run: yarn application -kill 

We also tried to run a flink job with 'mode: legacy' and we have the same 
issues:
 # Add property 'mode: legacy' to ./conf/flink-conf.yaml
 # Execute the following command:

{code:java}
/bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 2048 
-j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount
{code}
Flink CLI logs:
{code:java}
Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See 

[jira] [Updated] (FLINK-9891) Flink cluster is not shutdown in YARN mode when Flink client is stopped

2018-07-18 Thread Sergey Krasovskiy (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-9891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Krasovskiy updated FLINK-9891:
-
Description: 
We are not using session mode and detached mode. The command to run Flink job 
on YARN is:
{code:java}
/bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 2048 
-j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount
{code}
Flink CLI logs:
{code:java}
Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-07-18 12:47:03,747 INFO 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service 
address: http://hmaster-1.ipbl.rgcloud.net:8188/ws/v1/timeline/
2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No 
path for the flink jar passed. Using the location of class 
org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-07-18 12:47:04,222 INFO org.apache.flink.yarn.cli.FlinkYarnSessionCli - No 
path for the flink jar passed. Using the location of class 
org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2018-07-18 12:47:04,248 WARN 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Neither the 
HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set. The Flink 
YARN Client needs one of these to be set to properly load the Hadoop 
configuration for accessing YARN.
2018-07-18 12:47:04,409 INFO 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: 
ClusterSpecification{masterMemoryMB=768, taskManagerMemoryMB=2048, 
numberTaskManagers=1, slotsPerTaskManager=1}
2018-07-18 12:47:04,783 WARN 
org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory - The short-circuit 
local reads feature cannot be used because libhadoop cannot be loaded.
2018-07-18 12:47:04,788 WARN 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - The configuration 
directory 
('/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/conf')
 contains both LOG4J and Logback configuration files. Please delete or rename 
one of them.
2018-07-18 12:47:07,846 INFO 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Submitting application 
master application_1531474158783_10814
2018-07-18 12:47:08,073 INFO 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application 
application_1531474158783_10814
2018-07-18 12:47:08,074 INFO 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster 
to be allocated
2018-07-18 12:47:08,076 INFO 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, 
current state ACCEPTED
2018-07-18 12:47:12,864 INFO 
org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has been 
deployed successfully.
{code}
Job Manager logs:
{code:java}
2018-07-18 12:47:09,913 INFO 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 

2018-07-18 12:47:09,915 INFO 
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting 
YarnSessionClusterEntrypoint (Version: 1.5.1, Rev:3488f8b, Date:10.07.2018 @ 
11:51:27 GMT)
...
{code}
Issues:
 # Flink job is running as a Flink session
 # Ctrl+C or 'stop' doesn't stop a job and YARN cluster
 # Cancel job via Job Maanager web ui doesn't stop Flink cluster. To kill the 
cluster we need to run: yarn application -kill 

We also tried to run a flink job with 'mode: legacy' and we have the same 
issues:
 # Add property 'mode: legacy' to ./conf/flink-conf.yaml
 # Execute the following command:

{code:java}
/bin/flink run -m yarn-cluster -yn 1 -yqu flink -yjm 768 -ytm 2048 
-j ./flink-quickstart-java-1.0-SNAPSHOT.jar -c org.test.WordCount
{code}
Flink CLI logs:
{code:java}
Setting HADOOP_CONF_DIR=/etc/hadoop/conf because no HADOOP_CONF_DIR was set.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/flink-streaming/flink-streaming-1.5.1-1.5.1-bin-hadoop27-scala_2.11-1531485329/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/hdp/2.4.2.10-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-07-18 16:07:13,820 INFO