[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...

2016-10-27 Thread uce
Github user uce commented on the issue:

https://github.com/apache/flink/pull/2652
  
OK thanks. I addressed one last issue with shutting down the watch dog 
thread (it was lingering around sleeping and only noticed that the task has 
terminated after that sleep... now the task canceler interrupts the watchdog 
when everything works as expected and the watch dog thread terminates early).

I've also backported this to the `release-1.1` branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...

2016-10-20 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2652
  
+1 go ahead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...

2016-10-19 Thread uce
Github user uce commented on the issue:

https://github.com/apache/flink/pull/2652
  
OK, I would like to go ahead and merge this if there are no objections.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...

2016-10-18 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2652
  
Concerning the Kafka test: From the logs, the test fails because a topic 
cannot be deleted. The ZooKeeper operation blocks and test times out. I am 
pretty sure that this is unrelated, as no Flink is running at that point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...

2016-10-18 Thread uce
Github user uce commented on the issue:

https://github.com/apache/flink/pull/2652
  
Travis gives the green light except for a Kafka failure. @StephanEwen 
@rmetzger Do you know whether this is a known issue? Or might it be a 
regression from this change? 
https://s3.amazonaws.com/archive.travis-ci.org/jobs/168613643/log.txt


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...

2016-10-18 Thread uce
Github user uce commented on the issue:

https://github.com/apache/flink/pull/2652
  
Haha, that picture convinced me to actually add the test :smile: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...

2016-10-18 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2652
  
Yeah, this sort of covers it. Just afraid of such a situation here: 
https://twitter.com/thepracticaldev/status/687672086152753152


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...

2016-10-18 Thread uce
Github user uce commented on the issue:

https://github.com/apache/flink/pull/2652
  
We have the `TaskManagerProcessReapingTest` which tests that the 
TaskManager process properly exits when the TaskManager actor dies. In 
addition, there is `TaskManagerTest#testTerminationOnFatalError`, which tests 
that the `FatalError` message terminates the actor. 

Do you think we are already covered by this? We can certainly add a process 
reaper test variant that sends a `FatalError` message instead of the 
`PoisonPill`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...

2016-10-18 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2652
  
What do you think about a followup test, where we ensure that a fatal error 
notification on the TaskManager actually results in a process kill?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...

2016-10-18 Thread uce
Github user uce commented on the issue:

https://github.com/apache/flink/pull/2652
  
Renamed the `TaskOptions` class to `TaskManagerOptions` so that we can 
easily migrate the task manager options as a follow up. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...

2016-10-18 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2652
  
Should we have one class `TaskManagerOptions`? To not spread the config 
over too many classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...

2016-10-18 Thread uce
Github user uce commented on the issue:

https://github.com/apache/flink/pull/2652
  
Thanks for the valuable feedback. Some of the errors were a little sloppy 
on my side. Sorry for that. I addressed all your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---