rdblue commented on PR #15312:
URL: https://github.com/apache/iceberg/pull/15312#issuecomment-4339444427

   > I've updated the PR description - I hope it's clearer.
   
   If I understand correctly from #15031, it looks like the issue is that Flink 
uses new classloaders and the shutdown hook is never called because the JVM 
never exits. So we need use threadpools tied to Flink's lifecycle. That's 
reasonable, but we definitely do NOT want to give "users" control over 
threadpool lifecycles.
   
   I think this implementation goes a bit beyond what we need to do. Assuming 
that we want to have a way to shut down these pools, that doesn't mean that we 
should remove the shutdown hook. The contract of `newExitingWorkerPool` is that 
it will be shut down by a hook. Why not update the shutdown hook so that it is 
a noop if the pool is already shut down?
   
   Also, it's debatable whether we need to track all pools created by the 
factory methods here. This class is responsible for its own static pools and 
its methods are reused elsewhere. Having a way to close all threadpools created 
by this class seems very risky to me. I'm not convinced that we want to expose 
a method to shut down a single static threadpool directly, let alone one that 
will shut down pools created and managed by other classes. The reason for 
static pools is to ensure one is available. Allowing them to be closed allows 
one class to break functionality for another. Allowing all of them to be closed 
is even more risk.
   
   I think the solution is not to expose these methods. Instead, I propose two 
things:
   1. Deprecate and remove `newExitingWorkerPool` and fix the 3 uses of it. 
`HadoopFileIO` can manage the lifecycle of its thread pool.
   2. Lazily create the pools managed by `ThreadPools`. The APIs that use the 
static pools support passing in an executor service with a lifecycle managed by 
the client. This means that a Flink application can manage its own threadpools 
(which is why we added the API) and shut them down appropriately. If Flink 
isn't using the static pools, then they will never start and we don't need a 
shutdown method.
   
   I think if we do these two things then we will fix the problem and be better 
off. I'd also be okay with an option to prevent these pools from being started 
if we need more guarantees. I think that's safer and better than using them but 
shutting them down.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to