dejii commented on PR #38149:
URL: https://github.com/apache/beam/pull/38149#issuecomment-4233075506

   @ahmedabu98 following up on #37782 - that fix correctly moved FileIO close 
from `RecordWriter` to `RecordWriterManager`, but it turns out there's a deeper 
issue that only manifests under high write volume to dynamic destinations (many 
bundles per worker).
   
   The root cause: the catalog is `@MonotonicNonNull` on the DoFn and reused 
across all bundles on the same instance. `RecordWriterManager.close()` is 
called per bundle (`@FinishBundle`), so closing FileIO there, even 
deduplicated, kills the catalog's shared connection pool for all subsequent 
bundles on that DoFn.
   
   This PR removes FileIO close from `RecordWriterManager` entirely and adds 
`@Teardown` to all four IcebergIO write DoFns, so the catalog (and its FileIO) 
is closed exactly once when the DoFn instance is destroyed.
   
   Would appreciate your review here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to