kevinjqliu opened a new pull request, #16307:
URL: https://github.com/apache/iceberg/pull/16307

   Backport of #15683 (and length fix #16284) to `spark/v3.4`.
   
   Introduces `SerializableFileIOWithSize` to broadcast a table's `FileIO` to 
executors alongside the table metadata. Provides a `KnownSizeEstimation` so 
Spark skips the expensive `SizeEstimator` walk during broadcast, and makes 
`close()` a no-op on executors so broadcast cleanup does not destroy the 
driver's FileIO.
   
   ### Adaptation note
   
   v3.4's `BaseReader` still used the legacy `table.encryption().decrypt(...)` 
path. I switched that one method to `fileIO.bulkDecrypt(...)` to match 
v3.5/4.0/4.1, since the broadcast `FileIO` is now an `EncryptingFileIO` 
(combined in the constructor). All other files match the v3.5 patch 
byte-for-byte (with paths translated).
   
   ### Validation
   
   - `./gradlew -DsparkVersions=3.4 :iceberg-spark:iceberg-spark-3.4_2.12:test 
--tests "*SerializableFileIOWithSize*"` (new test, passes)
   - `./gradlew -DsparkVersions=3.4 :iceberg-spark:iceberg-spark-3.4_2.12:test 
--tests "org.apache.iceberg.spark.source.TestSparkReaderDeletes"` (passes)
   - Compile-checked spark-extensions tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to