GitHub user sihuazhou opened a pull request: https://github.com/apache/flink/pull/6133
[FLINK-9351][Distributed Coordination] RM stop assigning slot to Job because the TM killed before connecting to JM successfully ## What is the purpose of the change *This PR bases on https://github.com/apache/flink/pull/6132, it fails the allocation when the TM is killed before communicating with JM.* ## Brief change log - *Fail the allocation when the TM is killed before communicating with JM* ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes) - The S3 file system connector: (no) ## Documentation - No You can merge this pull request into a Git repository by running: $ git pull https://github.com/sihuazhou/flink FLINK-9351 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/6133.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6133 ---- commit 8fce8d518f922e114332933ab52dcc6e70f97856 Author: sihuazhou <summerleafs@...> Date: 2018-05-10T06:36:27Z Let ResourceManager notify JobManager about failed/killed TaskManagers. commit 02baebc459c7fcfb6b9e60c4c35d40cfc99cc6ed Author: sihuazhou <summerleafs@...> Date: 2018-06-07T06:16:32Z [FLINK-9351][Distributed Coordination]RM stop assigning slot to Job because the TM killed before connecting to JM successfully. ---- ---