[
https://issues.apache.org/jira/browse/FLINK-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946842#comment-15946842
]
ASF GitHub Bot commented on FLINK-6213:
---------------------------------------
GitHub user barcahead opened a pull request:
https://github.com/apache/flink/pull/3640
[FLINK-6213] [yarn] terminate resource manager itself when shutting down
application
When number of failed containers exceeds maximum failed containers,
`YarnFlinkResourceManager` will receive msg `StopCluster` and then invoke
`shutdownApplication`. In this method, it calls
`amrmclient.unregisterApplicationMaster` to finish the application. But the AM
container is not released until 10 minutes later triggered by RM ping check
timeout.
I fix this issue by terminating resource manager itself after unregistering
application master, then the process will exit and the container will be
released.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/barcahead/flink FLINK-6213
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3640.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3640
----
commit 1f4c91af090189d8a797a500701689b6639c4a85
Author: fengyelei <[email protected]>
Date: 2017-03-29T03:40:24Z
[FLINK-6213] [yarn] terminate resource manager itself when shutting down
application
----
> When number of failed containers exceeds maximum failed containers and
> application is stopped, the AM container will be released 10 minutes later
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-6213
> URL: https://issues.apache.org/jira/browse/FLINK-6213
> Project: Flink
> Issue Type: Bug
> Components: YARN
> Affects Versions: 1.2.0, 1.3.0
> Reporter: Yelei Feng
>
> When number of failed containers exceeds maximum failed containers and
> application is stopped, the AM container will be released 10 minutes later. I
> checked yarn log and found out after invoking
> {{unregisterApplicationMaster}}, the AM container is not released. After 10
> minutes, the release is triggered by RM ping check timeout.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)