pvary commented on PR #15312:
URL: https://github.com/apache/iceberg/pull/15312#issuecomment-4342379909

   @rdblue
   > Allowing them to be closed allows one class to break functionality for 
another.
   
   Technically, nothing prevents users from shutting down the static pools 
already, for example:
   ```
       ThreadPools.getWorkerPool().shutdown();
   ```
   Today we mostly rely on reviews to prevent this, and in some cases it is not 
necessarily obvious. If the goal is to guarantee that users cannot shut these 
pools down, that could be addressed more directly (e.g., by wrapping the 
returned executor), but that safeguard does not exist today.
   
   Changing all usages of `newExitingWorkerPool` / `newExitingScheduledPool` 
would also be fairly invasive. These thread pools are used across several 
`FileIO` implementations (e.g., `HadoopFileIO` → `newExitingWorkerPool`, 
`S3FileIO` / `GCSFileIO` → `newExitingScheduledPool`, `ADLSFileIO` → 
`getWorkerPool`).
   
   Do you think a better long‑term direction would be to change 
`FileIO.initialize(Map<String, String> properties)` to something like:
   ```
       FileIO.initialize(
           Map<String, String> properties,
           Function<Class<? extends FileIO>, ScheduledExecutorService> 
executorFactory)
   ```
   and propagate this through the `FileIO` implementations?
   
   In practice, `FileIO` instances are often created by catalogs, which suggest 
that we change the `Catalog.initialize(String name, Map<String, String> 
properties)` to something like:
   ```
       Catalog.initialize(
           String name,
           Map<String, String> properties,
           Function<Class<? extends FileIO>, ScheduledExecutorService> 
executorFactory)
   ```
   and propagate this through the `Catalog` implementations and change the 
`CatalogUtil.loadCatalog(String impl, String catalogName, Map<String, String> 
properties, Object hadoopConf)` to allow setting the factory.
   
   The FileIO implementations can't manage the lifecycle for the pools as they 
don't know if they should keep the shared resources open or they should close 
them too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to