rdblue commented on PR #15312: URL: https://github.com/apache/iceberg/pull/15312#issuecomment-4339444427
> I've updated the PR description - I hope it's clearer. If I understand correctly from #15031, it looks like the issue is that Flink uses new classloaders and the shutdown hook is never called because the JVM never exits. So we need use threadpools tied to Flink's lifecycle. That's reasonable, but we definitely do NOT want to give "users" control over threadpool lifecycles. I think this implementation goes a bit beyond what we need to do. Assuming that we want to have a way to shut down these pools, that doesn't mean that we should remove the shutdown hook. The contract of `newExitingWorkerPool` is that it will be shut down by a hook. Why not update the shutdown hook so that it is a noop if the pool is already shut down? Also, it's debatable whether we need to track all pools created by the factory methods here. This class is responsible for its own static pools and its methods are reused elsewhere. Having a way to close all threadpools created by this class seems very risky to me. I'm not convinced that we want to expose a method to shut down a single static threadpool directly, let alone one that will shut down pools created and managed by other classes. The reason for static pools is to ensure one is available. Allowing them to be closed allows one class to break functionality for another. Allowing all of them to be closed is even more risk. I think the solution is not to expose these methods. Instead, I propose two things: 1. Deprecate and remove `newExitingWorkerPool` and fix the 3 uses of it. `HadoopFileIO` can manage the lifecycle of its thread pool. 2. Lazily create the pools managed by `ThreadPools`. The APIs that use the static pools support passing in an executor service with a lifecycle managed by the client. This means that a Flink application can manage its own threadpools (which is why we added the API) and shut them down appropriately. If Flink isn't using the static pools, then they will never start and we don't need a shutdown method. I think if we do these two things then we will fix the problem and be better off. I'd also be okay with an option to prevent these pools from being started if we need more guarantees. I think that's safer and better than using them but shutting them down. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
