Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/750#issuecomment-170942707 Good work @mjsax. I think the testing of the REST interface is sufficient, also without having a dedicated YARN test case. I had some more comments concerning the failure case handling of stop calls. The first problem is still the handling of exceptions when calling `stop` on the `Invokable` in `Task.stopExecution`. The exception will only be logged but no further action is taken. This can lead to a situation where we have a corrupted state. I think, we should fail the task in such a situation. Additionally, the case that a task cannot be found on the `TaskManager` and that an exception occurs in `Task.stopExecution` are treated identically by sending a `TaskOperationResult` with `success == false` to the `JobManager`. On the `JobManager` side this will only be logged. I think the exception case should be handled differently. Failing the execution, for example. And it is still possible that you send a `StopJob` message to the `JobManager`, see that the job is in state `RUNNING`, then the `ExecutionGraph` switches to `RESTARTING`, and then the stop call is executed on the `ExecutionGraph` which won't have an effect. As a user you will receive a `StoppingSuccess` message but the job will simply be restarted. I think we should also allow stopping jobs when they are in the state `RESTARTING`. What do you think?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---