boaz-gold opened a new issue, #15898: URL: https://github.com/apache/iceberg/issues/15898
### Apache Iceberg version
1.10.0
### Query engine
Spark
### Please describe the bug 🐞
Apache Iceberg version: 1.10.0
Component: org.apache.iceberg.CachingCatalog
Description
CachingCatalog uses a Caffeine cache to hold Table objects. When an entry
is evicted (by TTL via cache.expiration-interval-ms or by size via
cache.max-total-bytes), the RemovalListener
(MetadataTableInvalidatingRemovalListener) only invalidates related
metadata table entries. It does not call table.io().close().
This means any resources held by the FileIO implementation are never
released on eviction.
Impact
With io-impl = org.apache.iceberg.aws.s3.S3FileIO:
- Each evicted Table leaves behind a live AWS SDK v2 S3Client
- Each S3Client owns a ScheduledExecutorService (sdk-ScheduledExecutor-N)
with background threads for credential refresh (IMDSv2)
- These threads are GC roots — they can never be collected
- In a long-running process (e.g. Spark Thrift Server), threads accumulate
without bound until the JVM crashes with os::commit_memory failed; error='Not
enough space' (errno=12)
Observed in production (Spark Thrift Server, ~24h uptime):
Total JVM threads: 27,877
sdk-ScheduledExecutor: 27,657
Distinct pool instances: 8,075+
Proof from bytecode
CachingCatalog$MetadataTableInvalidatingRemovalListener.onRemoval()
decompiled from iceberg-spark-runtime-3.5_2.12-1.10.0:
// logs debug
// if EXPIRED and not a metadata table:
cache.invalidateAll(metadataTableIdentifiers)
// return ← no close() call
There is no table.io().close() call anywhere in the eviction path.
Proposed fix
In CachingCatalog.java,
MetadataTableInvalidatingRemovalListener.onRemoval():
if (value != null && value.io() instanceof Closeable) {
try {
((Closeable) value.io()).close();
} catch (IOException e) {
LOG.warn("Failed to close FileIO for evicted table {}", key, e);
}
}
Note: S3FileIO implements Closeable and its close() method calls
S3Client.close(), which shuts down the ScheduledExecutorService and releases
all threads. This fix is sufficient to resolve the leak.
### Willingness to contribute
- [ ] I can contribute a fix for this bug independently
- [ ] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
