Re: [PR] Fix IcebergIO conn pool crash by moving FileIO lifecycle to @Teardown [beam]

via GitHub Mon, 13 Apr 2026 11:43:45 -0700


dejii commented on code in PR #38149:
URL: https://github.com/apache/beam/pull/38149#discussion_r3075079835



##########
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/RecordWriterManager.java:
##########
@@ -434,20 +431,6 @@ public void close() throws IOException {
         state.dataFiles.clear();
       }
     } finally {
-      // Close unique FileIO instances now that all writers are done.
-      // table.io() may return a shared FileIO; we deduplicate by identity
-      // so we close each underlying connection pool exactly once.
-      Set<FileIO> closedIOs = new HashSet<>();
-      for (DestinationState state : destinations.values()) {
-        FileIO io = state.table.io();
-        if (io != null && closedIOs.add(io)) {
-          try {
-            io.close();

Review Comment:
   > are there any issues if io is not closed?
   
   Yes - not closing `FileIO` means the underlying HTTP connection pool (S3, 
GCS etc.) leaks for the lifetime of the DoFn instance. In practice this could 
be acceptable, but it's not ideal. This is why I think `@Teardown` is the right 
place - it gives best-effort cleanup via `catalog.close()`, which properly 
closes the `FileIO` through the catalog's own `CloseableGroup`. If teardown is 
skipped, the leak is bounded to the worker's lifetime.
   
   Catalogs like 
[RESTSessionCatalog#L1167](https://github.com/apache/iceberg/blob/88d460425ab6a0e4c938e90665b4b772d7b57c41/core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java#L1167)
 share a single `FileIO` instance across all tables (backed by a single HTTP 
connection pool). When `RecordWriterManager.close()` called `io.close()` after 
each bundle, it permanently destroyed that shared connection pool. The catalog 
still held a reference to the now-dead `FileIO`, so every subsequent bundle on 
the same DoFn instance failed with "connection pool shut down." which is the 
bug I'm dealing with.
   
   Not closing the IO in `RecordWriterManager` means a potential connection 
pool leak at DoFn teardown. Closing it in `RecordWriterManager` (current code) 
means a guaranteed crash after the first bundle. The former is acceptable, the 
latter shouldn't be.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Fix IcebergIO conn pool crash by moving FileIO lifecycle to @Teardown [beam]

Reply via email to