mgmarino commented on issue #12046:
URL: https://github.com/apache/iceberg/issues/12046#issuecomment-2618501091

   Just to note, I have backed out the "workaround" (commenting out the closure 
of the S3FileIO), and have started seeing fewer errors on Glue. I'm not sure if 
something happened here, but since this seems to be dependent on memory, 
perhaps AWS tweaked some settings that lead to the broadcast table *not* being 
deleted before the task is in fact done.
   
   I will continue running this and have steadily added additional logging to 
help me trace where there is coming from, but it looks like:
   
   - Spark is releasing the broadcast SerializedTable before the task is 
complete and this leads to closure of the IO, which then leads to failure of a 
subsequent read. I am not totally sure why Spark does this clean up and am 
trying to get at whether or not this is a Spark issue, or something that 
Iceberg needs to protect against.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to