[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...
Github user uce commented on the issue: https://github.com/apache/flink/pull/2652 OK thanks. I addressed one last issue with shutting down the watch dog thread (it was lingering around sleeping and only noticed that the task has terminated after that sleep... now the task canceler interrupts the watchdog when everything works as expected and the watch dog thread terminates early). I've also backported this to the `release-1.1` branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...
Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/2652 +1 go ahead --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...
Github user uce commented on the issue: https://github.com/apache/flink/pull/2652 OK, I would like to go ahead and merge this if there are no objections. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...
Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/2652 Concerning the Kafka test: From the logs, the test fails because a topic cannot be deleted. The ZooKeeper operation blocks and test times out. I am pretty sure that this is unrelated, as no Flink is running at that point. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...
Github user uce commented on the issue: https://github.com/apache/flink/pull/2652 Travis gives the green light except for a Kafka failure. @StephanEwen @rmetzger Do you know whether this is a known issue? Or might it be a regression from this change? https://s3.amazonaws.com/archive.travis-ci.org/jobs/168613643/log.txt --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...
Github user uce commented on the issue: https://github.com/apache/flink/pull/2652 Haha, that picture convinced me to actually add the test :smile: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...
Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/2652 Yeah, this sort of covers it. Just afraid of such a situation here: https://twitter.com/thepracticaldev/status/687672086152753152 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...
Github user uce commented on the issue: https://github.com/apache/flink/pull/2652 We have the `TaskManagerProcessReapingTest` which tests that the TaskManager process properly exits when the TaskManager actor dies. In addition, there is `TaskManagerTest#testTerminationOnFatalError`, which tests that the `FatalError` message terminates the actor. Do you think we are already covered by this? We can certainly add a process reaper test variant that sends a `FatalError` message instead of the `PoisonPill`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...
Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/2652 What do you think about a followup test, where we ensure that a fatal error notification on the TaskManager actually results in a process kill? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...
Github user uce commented on the issue: https://github.com/apache/flink/pull/2652 Renamed the `TaskOptions` class to `TaskManagerOptions` so that we can easily migrate the task manager options as a follow up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...
Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/2652 Should we have one class `TaskManagerOptions`? To not spread the config over too many classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2652: [FLINK-4715] Fail TaskManager with fatal error if task ca...
Github user uce commented on the issue: https://github.com/apache/flink/pull/2652 Thanks for the valuable feedback. Some of the errors were a little sloppy on my side. Sorry for that. I addressed all your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---