[
https://issues.apache.org/jira/browse/FLINK-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948897#comment-15948897
]
ASF GitHub Bot commented on FLINK-6152:
---------------------------------------
GitHub user barcahead opened a pull request:
https://github.com/apache/flink/pull/3654
[FLINK-6152] [yarn] don't throw exception if client can't receive cluster
status
Currently if client can't get jobmanager status, it will throw exception
and trigger shutdown hook. In the shutdown hook method, config file, keytab
file and properties file will all be deleted. As result, AM restarting will
fail.
I fix this issue by only responding cluster status failure from yarn
client, which tolerates am restarting.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/barcahead/flink FLINK-6152
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3654.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3654
----
commit 7726dc9f4005fe7fae959b06203298c2b0094642
Author: fengyelei <[email protected]>
Date: 2017-03-30T08:40:32Z
[FLINK-6152] [yarn] don't throw exception if client can't receive cluster
status
----
> Yarn session CLI tries to shut cluster down too agressively in interative mode
> ------------------------------------------------------------------------------
>
> Key: FLINK-6152
> URL: https://issues.apache.org/jira/browse/FLINK-6152
> Project: Flink
> Issue Type: Bug
> Components: Client
> Affects Versions: 1.2.0, 1.3.0
> Reporter: Yelei Feng
> Assignee: Yelei Feng
>
> Once yarn session CLI can't get cluster status, it shuts the cluster down and
> cleanup related files even if new jobmanger will be created soon. As result,
> AM restarting will fail due to missing files on HDFS
> reproduce step:
> 1. start yarn session in interactive mode
> 2. kill jobmanager process
> 3. yarn session client can't get cluster status in lookup time and hence
> trigger shutdown hook which would delete local properties files and files on
> HDFS, but it can't shutdown the cluster since it can't connect to jobmanager.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)