[ https://issues.apache.org/jira/browse/FLINK-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041478#comment-15041478 ]
ASF GitHub Bot commented on FLINK-2111: --------------------------------------- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/750#issuecomment-161949525 When fixing the `JobManagerTest` I noticed the following. When the job was stopped when it was still in the state `SCHEDULED` or `DEPLOYING`, then one received a `StoppingSuccess`. The problem was that the stop was not executed and the job later switched to `RUNNING`. The same can be observed if the job is in state `RESTARTING`. Stopping a restarting job does nothing even though you receive a `StoppingSuccess` message. The job will later be redeployed. As a user I would expect that the job is immediately stopped or at least at the next possible moment (e.g. when it's deployed). Or I would expect that the system tells me that the stopping is at the moment not possible. Similar is the question, what happens if only a subset of all sources is deployed and in the state `RUNNING`. This would mean that the undeployed sources won't get noticed about the stopping signal and, thus, be normally deployed. Furthermore, what happens if the `stop` method of the `SourceFunction` throws an unchecked exception? If I'm not mistaken, then this will only get logged. But shouldn't the task be cancelled in such a situation because the state cannot be guaranteed to be consistent anymore? The case that a `Task` is not `Stoppable` and that a `Task` cannot be found on the `TaskManager` are treated by the `Execution` identically. Both cases cause a `TaskOperationResult(executionID, false, message)` to be sent back to the `Execution`. There it will be logged that the stopping call "did not find the task". I think it would be good to differentiate the two cases. > Add "stop" signal to cleanly shutdown streaming jobs > ---------------------------------------------------- > > Key: FLINK-2111 > URL: https://issues.apache.org/jira/browse/FLINK-2111 > Project: Flink > Issue Type: Improvement > Components: Distributed Runtime, JobManager, Local Runtime, > Streaming, TaskManager, Webfrontend > Reporter: Matthias J. Sax > Assignee: Matthias J. Sax > Priority: Minor > > Currently, streaming jobs can only be stopped using "cancel" command, what is > a "hard" stop with no clean shutdown. > The new introduced "stop" signal, will only affect streaming source tasks > such that the sources can stop emitting data and shutdown cleanly, resulting > in a clean shutdown of the whole streaming job. > This feature is a pre-requirment for > https://issues.apache.org/jira/browse/FLINK-1929 -- This message was sent by Atlassian JIRA (v6.3.4#6332)