[GitHub] flink pull request: [FLINK-2111] Add "terminate" signal to cleanly...

2015-05-29 Thread mjsax
GitHub user mjsax opened a pull request:

https://github.com/apache/flink/pull/750

[FLINK-2111] Add "terminate" signal to cleanly stop streaming jobs

added TERMINATE signal
  - JobManager, TaskManager, ExecutionGraph
  - CliFrontend, JobManagerFrontend
  - StreamTask, StreamSource

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/flink flink-2111-terminate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/750.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #750


commit afbc7ce1e140871d0141fd9e7f9173f2d969d416
Author: mjsax 
Date:   2015-05-29T23:16:18Z

added TERMINATE signal
  - JobManager, TaskManager, ExecutionGraph
  - CliFrontend, JobManagerFrontend
  - StreamTask, StreamSource
fixes FLINK-2111




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2111] Add "terminate" signal to cleanly...

2015-06-01 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/750#issuecomment-107549742
  
Good idea!

Given the fact that we are stabilizing at the moment (and this is 
introducing new functionality), I vote to postpone that to after the 0.9 
release. That would be two weeks or so, if things go well.

Until then, here are a few thoughts

  - Terminate sounds like a very non-graceful killing. What this does is 
the opposite, so how about calling it "stop" signal.
 - We will need a way to stop streaming jobs and perform a checkpoint 
during the stopping. Would be good to bear this in mind. Can we add this as 
part of this pull request?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2111] Add "terminate" signal to cleanly...

2015-06-02 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/750#issuecomment-107926235
  
Hi,
after thinking about the signal name again, I disagree. "terminate" does 
not mean kill, but is a request to gracefully shut down. "stop" from my point 
of view is too soft, as it does not indicates strongly enough, that the job is 
completely shut down and "cleared". I would interpret "stop" more as an 
interrupt signal (with an according "re-start" signal, that wakes up the job 
again).

Any other opinions? Maybe we need a voting for the signal name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2111] Add "terminate" signal to cleanly...

2015-06-22 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/750#issuecomment-114109425
  
I had a look at your code. A lot of it looks good. 

The one thing I would really like to change is the modification of the 
execution graph state machine. That one is delicate enough and any change to 
that is super critical and intensive to review and test.

I think we can add this functionality without any change to that state 
machine, and should do that. Stopping a streaming program is much like letting 
it come to a finish (telling the source to not consume the stream any further).

How about simply sending the stopping message to the running sources (not 
to all vertices) and letting the execution states transition from RUNNING to 
FINISHED in the regular manner. This would also allow concurrent canceling, 
failure, everything would remain transparent.

In an initial version, I would also recommend to not have the extra 
JobStatus. Just to get the thing working and stable. Then we can later add the 
extra state, if desired (for logging, web frontend).

Some other comments:
  - I think it would be nice to not overload the `AbstractInvokable` with 
more methods. Similar as in the case of checkpointing, you can have an 
interface (stoppable ;-) ) that the invokable can implement to receive the 
message.


BTW: I still prefer "stop" over "terminate" ;-) Terminating the programs 
sounds very forceful (like dropping a bomb in the data center). Maybe I am just 
spoiled by some movies...





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2111] Add "terminate" signal to cleanly...

2015-06-22 Thread mjsax
Github user mjsax commented on the pull request:

https://github.com/apache/flink/pull/750#issuecomment-114123970
  
I agree that extending the state graph is critical. However, IMHO it is the 
right way to go. I would rather spend more time on getting it right in the 
first place, than introducing it later on. I added a couple of tests that 
should ensure that the transitions work correctly. Furthermore, sending 
"terminate" to a source relies on a cooperative source. If the source ignores 
the signal, it should be in state TERMINATING to indicate the user that the 
signal was received. If the source ignores the signal, the user can still 
"cancel" the job. Additionally, I can imagine that a source might take a few 
seconds to stop emitting new data (eg, for checkpoint/consistency reason). For 
this case, having a separate TERMINATING state is also nice.

Same about extra JobStatus. I rather spent more time and get it right from 
the beginning on.

I think that sending the terminate signal to sources only is a good idea (I 
thought about it, too), as other operators ignore the signal anyway. However, I 
don't know how to distinguish sources from operators/sinks and how to send a 
signal to dedicated operators only? Furthermore, I would like to enable 
"terminate" signal for streaming jobs only (but once again, I don't know how to 
distinguish streaming and batch). Can you give some hints how to get both of 
this done?

About AbstractInvocable -> using an interface is fine with me. If we sent 
the "terminate" signal to streaming sources only, it will result in a cleaner 
design.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2111] Add "terminate" signal to cleanly...

2015-06-22 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/750#issuecomment-114188551
  
I disagree with the execution graph changes. I took us 6 months to get it 
stable, and we still discover corner cases. The new state interacts with all 
other states. Unless there is a heavily compelling reason to change this, I am 
against changing it (not just here, but as a general rule). I really do not see 
the necessity. But I see about two weeks of nighly debugging sessions coming up 
when we touch the state machine and users report critical issues ;-)

I am open for discussion concerning adding a new JobStatus, which would 
display the job as "shutting down". If it did not react then, users see it and 
can trigger canceling.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---