[GitHub] flink pull request: [FLINK-2111] Add "terminate" signal to cleanly...
GitHub user mjsax opened a pull request: https://github.com/apache/flink/pull/750 [FLINK-2111] Add "terminate" signal to cleanly stop streaming jobs added TERMINATE signal - JobManager, TaskManager, ExecutionGraph - CliFrontend, JobManagerFrontend - StreamTask, StreamSource You can merge this pull request into a Git repository by running: $ git pull https://github.com/mjsax/flink flink-2111-terminate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/750.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #750 commit afbc7ce1e140871d0141fd9e7f9173f2d969d416 Author: mjsax Date: 2015-05-29T23:16:18Z added TERMINATE signal - JobManager, TaskManager, ExecutionGraph - CliFrontend, JobManagerFrontend - StreamTask, StreamSource fixes FLINK-2111 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2111] Add "terminate" signal to cleanly...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/750#issuecomment-107549742 Good idea! Given the fact that we are stabilizing at the moment (and this is introducing new functionality), I vote to postpone that to after the 0.9 release. That would be two weeks or so, if things go well. Until then, here are a few thoughts - Terminate sounds like a very non-graceful killing. What this does is the opposite, so how about calling it "stop" signal. - We will need a way to stop streaming jobs and perform a checkpoint during the stopping. Would be good to bear this in mind. Can we add this as part of this pull request? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2111] Add "terminate" signal to cleanly...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/750#issuecomment-107926235 Hi, after thinking about the signal name again, I disagree. "terminate" does not mean kill, but is a request to gracefully shut down. "stop" from my point of view is too soft, as it does not indicates strongly enough, that the job is completely shut down and "cleared". I would interpret "stop" more as an interrupt signal (with an according "re-start" signal, that wakes up the job again). Any other opinions? Maybe we need a voting for the signal name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2111] Add "terminate" signal to cleanly...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/750#issuecomment-114109425 I had a look at your code. A lot of it looks good. The one thing I would really like to change is the modification of the execution graph state machine. That one is delicate enough and any change to that is super critical and intensive to review and test. I think we can add this functionality without any change to that state machine, and should do that. Stopping a streaming program is much like letting it come to a finish (telling the source to not consume the stream any further). How about simply sending the stopping message to the running sources (not to all vertices) and letting the execution states transition from RUNNING to FINISHED in the regular manner. This would also allow concurrent canceling, failure, everything would remain transparent. In an initial version, I would also recommend to not have the extra JobStatus. Just to get the thing working and stable. Then we can later add the extra state, if desired (for logging, web frontend). Some other comments: - I think it would be nice to not overload the `AbstractInvokable` with more methods. Similar as in the case of checkpointing, you can have an interface (stoppable ;-) ) that the invokable can implement to receive the message. BTW: I still prefer "stop" over "terminate" ;-) Terminating the programs sounds very forceful (like dropping a bomb in the data center). Maybe I am just spoiled by some movies... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2111] Add "terminate" signal to cleanly...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/750#issuecomment-114123970 I agree that extending the state graph is critical. However, IMHO it is the right way to go. I would rather spend more time on getting it right in the first place, than introducing it later on. I added a couple of tests that should ensure that the transitions work correctly. Furthermore, sending "terminate" to a source relies on a cooperative source. If the source ignores the signal, it should be in state TERMINATING to indicate the user that the signal was received. If the source ignores the signal, the user can still "cancel" the job. Additionally, I can imagine that a source might take a few seconds to stop emitting new data (eg, for checkpoint/consistency reason). For this case, having a separate TERMINATING state is also nice. Same about extra JobStatus. I rather spent more time and get it right from the beginning on. I think that sending the terminate signal to sources only is a good idea (I thought about it, too), as other operators ignore the signal anyway. However, I don't know how to distinguish sources from operators/sinks and how to send a signal to dedicated operators only? Furthermore, I would like to enable "terminate" signal for streaming jobs only (but once again, I don't know how to distinguish streaming and batch). Can you give some hints how to get both of this done? About AbstractInvocable -> using an interface is fine with me. If we sent the "terminate" signal to streaming sources only, it will result in a cleaner design. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2111] Add "terminate" signal to cleanly...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/750#issuecomment-114188551 I disagree with the execution graph changes. I took us 6 months to get it stable, and we still discover corner cases. The new state interacts with all other states. Unless there is a heavily compelling reason to change this, I am against changing it (not just here, but as a general rule). I really do not see the necessity. But I see about two weeks of nighly debugging sessions coming up when we touch the state machine and users report critical issues ;-) I am open for discussion concerning adding a new JobStatus, which would display the job as "shutting down". If it did not react then, users see it and can trigger canceling. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---