[PR] Spark: Add session-level split size override [iceberg]

via GitHub Wed, 29 Apr 2026 01:53:19 -0700


gerashegalov opened a new pull request, #16154:
URL: https://github.com/apache/iceberg/pull/16154


   Closes #16153
   
   ### What changes were made in this PR?
   
   Add a new Spark session configuration key `spark.sql.iceberg.split-size` 
that allows overriding
   the `read.split.target-size` table property at the session level without 
requiring DDL changes
   to table metadata or source code changes to read call sites.
   
   This is particularly useful when GPU and CPU workloads read the same Iceberg 
table
   concurrently: GPU sessions benefit from significantly larger splits (e.g. 
2GB) while CPU
   sessions perform better with the default 128MB. Hardware accelerators like
   [RAPIDS Accelerator for Apache 
Spark](https://nvidia.github.io/spark-rapids/) are designed as
   drop-in replacements requiring no application code changes, so a 
session-level knob is essential.
   
   ### Changes
   
   **All Spark shims (v3.4, v3.5, v4.0):**
   - `SparkSQLProperties`: add `SPLIT_SIZE = "spark.sql.iceberg.split-size"` 
constant
   - `SparkReadConf`: add `.sessionConf(SparkSQLProperties.SPLIT_SIZE)` to both 
`splitSize()` and
     `splitSizeOption()` parser chains; update Javadoc to document 5-level 
precedence
   - `SparkConfParser`: store `Table.name()` as `tableName` and in 
`ConfParser.parse()` try a
     table-qualified session key (`<key>.<tableName>`) before the global 
session key
   
   **v3.5 only:**
   - `TestSparkWriteConf`: add 4 tests for table-scoped session conf resolution
   
   ### Resolution precedence
   
   1. Read option (`split-size`)
   2. Table-scoped session conf 
(`spark.sql.iceberg.split-size.<catalog>.<db>.<table>`)
   3. Global session conf (`spark.sql.iceberg.split-size`)
   4. Table property (`read.split.target-size`)
   5. Default (128MB)
   
   ### How was this patch tested?
   
   4 new unit tests in `TestSparkWriteConf` (v3.5):
   - table-scoped session key takes precedence over global
   - global session key works when no table-scoped key is set
   - read option takes precedence over table-scoped session key
   - table-scoped session key takes precedence over table property
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Spark: Add session-level split size override [iceberg]

Reply via email to