sumedhsakdeo opened a new pull request, #10336:
URL: https://github.com/apache/iceberg/pull/10336
We have a scheduled job that deletes rows in an Iceberg table. The job is
authored in SQL. Given we use CoW technique for data deletion the job would
rewrite the files without the deleted rows. We want to tune this job so that it
creates files that are ~512MB on HDFS. We are unable to use option given job
uses SparkSQL and setting `read.split.target-size` table property is not
desired as it impacts are readers.
PR adds ability to control the split-size for a given spark SQL job, by
introducing a property `spark.sql.iceberg.split-size` which can be set as spark
session conf.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]