[PR] Spark: Add SparkSQLProperty to control split-size [iceberg]

via GitHub Wed, 15 May 2024 08:00:21 -0700


sumedhsakdeo opened a new pull request, #10336:
URL: https://github.com/apache/iceberg/pull/10336


   We have a scheduled job that deletes rows in an Iceberg table. The job is 
authored in SQL. Given we use CoW technique for data deletion the job would 
rewrite the files without the deleted rows. We want to tune this job so that it 
creates files that are ~512MB on HDFS. We are unable to use option given job 
uses SparkSQL and setting `read.split.target-size` table property is not 
desired as it impacts are readers. 
    
   PR adds ability to control the split-size for a given spark SQL job, by 
introducing a property `spark.sql.iceberg.split-size` which can be set as spark 
session conf.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Spark: Add SparkSQLProperty to control split-size [iceberg]

Reply via email to