mgmarino commented on issue #12046: URL: https://github.com/apache/iceberg/issues/12046#issuecomment-2618501091
Just to note, I have backed out the "workaround" (commenting out the closure of the S3FileIO), and have started seeing fewer errors on Glue. I'm not sure if something happened here, but since this seems to be dependent on memory, perhaps AWS tweaked some settings that lead to the broadcast table *not* being deleted before the task is in fact done. I will continue running this and have steadily added additional logging to help me trace where there is coming from, but it looks like: - Spark is releasing the broadcast SerializedTable before the task is complete and this leads to closure of the IO, which then leads to failure of a subsequent read. I am not totally sure why Spark does this clean up and am trying to get at whether or not this is a Spark issue, or something that Iceberg needs to protect against. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
