Re: [I] Spark rewrite_data_files failing with java.lang.IllegalStateException: Connection pool shut down [iceberg]

via GitHub Fri, 24 Jan 2025 09:00:31 -0800


mgmarino commented on issue #12046:
URL: https://github.com/apache/iceberg/issues/12046#issuecomment-2612986973


   After doing some further investigation, my initial conclusion is the 
following:
   
   - I can see `SerializableTableWithSize` being generated on the driver at 
least in two different places:
       - `org.apache.iceberg.spark.source.SparkWrite.createWriterFactory`: 
   
https://github.com/apache/iceberg/blob/6e2bc9ac4ef9ca9afeff66814de6567ae63da9da/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java#L190
       - `org.apache.iceberg.spark.source.SparkBatch.planInputPartitions`: 
https://github.com/apache/iceberg/blob/6e2bc9ac4ef9ca9afeff66814de6567ae63da9da/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatch.java#L78
   where both tables are pointing to the same `FileIO` object (in this case 
`S3FileIO`).
   - If these jobs get submitted to the same Executor, on deserialization they 
will still point to the *same* IO object, meaning that when one gets cleaned up 
(and closed), it will affect the other.
   
   I am not sure what a good solution is here, but I suspect that the FileIO 
may need to be copied when creating the serializable table instead of what is 
done now:
   
   
https://github.com/apache/iceberg/blob/6e2bc9ac4ef9ca9afeff66814de6567ae63da9da/core/src/main/java/org/apache/iceberg/SerializableTable.java#L123
   
   Would love to get some input here!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Spark rewrite_data_files failing with java.lang.IllegalStateException: Connection pool shut down [iceberg]

Reply via email to