Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/2449
+1 for moving the `PartitionStateChecker` and
`ResultPartitionConsumableNotifier` out of the `NetworkEnvironment`.
Few questions and comments:
- Do we need an extra ExecutorService in the TaskManager? I have been
digging through a bunch of thread dumps over time and there are already many
threads and pools already. I would really like to avoid having yet another
Thread pool (creating thread pools should be an extremely careful decision).
The Akka thread pool executor is quite over-provisioned for the few
actors we actually use. I think it is perfectly feasible to use that one for
the few extra futures introduced here. In any case, if not reusing the Akka
executor pool, then the thread pool needs to be shut down in the TaskManager
runner. Otherwise it creates a leak when running successive local Flink jobs.
- I am a bit consumed about the `SlotEnvironment`. Maybe it is mainly the
name, but what does it have to do with the slots? Is it not more like a
network-messages specific *JobManager Connection*?
- The `ResultPartitionConsumableNotifier` could be per `Task` - that way,
future multi-JobManager assiciations would work seamlessly and it could
directly call `fail(...)` on the Task without having to go through the
`TaskManager`. It could probably leave the TaskManager out of the picture
completely.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---