[DISCUSS] Critical FileIO Resource Management Issue - PR #12868

Xiaoxuan Li Tue, 08 Jul 2025 10:27:18 -0700

Hi team,

PR #12868 <https://github.com/apache/iceberg/pull/12868> addresses a
critical issue regarding FileIO resource management in Spark that requires
broader community discussion and review.

Issue Summary:

When Spark cleans up broadcast variables, calling FileIO.close() can
unintentionally shut down shared resources, such as HTTP connection pools.
This is particularly problematic when using S3FileIO with the default
ApacheHttpClient, and it can cause Spark read/write queries to fail.

Technical Details:

- Spark will broadcast FileIO instances to executors.
- During driver broadcast variable cleanup, calling close() can
terminate shared resources (e.g., HTTP client connection pool) on the
executor. This breaks core Iceberg functionality, see #12858
<https://github.com/apache/iceberg/issues/12858>, #12046
<https://github.com/apache/iceberg/issues/12046>.
- This issue is particularly acute with S3FileIO using ApacheHttpClient,
as all instances share the same connection pool instance (see referenced
code

<https://github.com/apache/httpcomponents-client/blob/master/httpclient5/src/main/java/org/apache/hc/client5/http/impl/io/PoolingHttpClientConnectionManager.java#L263>).
However, the core problem lies in the broader approach to managing shared
resources within FileIO.

Request for Community Input:

- Review the proposed solution in PR #12868
<https://github.com/apache/iceberg/pull/12868>
- Discuss whether this is the correct way to fix the issue
- We might also consider whether a more explicit resource ownership and
lifecycle management model is needed.

This issue impacts many users running Iceberg on Spark in production, so
timely review and feedback would be appreciated.

Best,

Xiaoxuan Li

[DISCUSS] Critical FileIO Resource Management Issue - PR #12868

Reply via email to