Re: [PR] Spark 3.5: Honor Spark conf spark.sql.files.maxPartitionBytes in read split [iceberg]

via GitHub Sun, 29 Oct 2023 23:16:38 -0700


jzhuge commented on PR #8922:
URL: https://github.com/apache/iceberg/pull/8922#issuecomment-1784547459


   @rdblue When we migrate users from Hive tables to Iceberg tables, some jobs 
hit executor OOM, as they were tuned for Hive tables with 
`spark.sql.files.maxPartitionBytes`. Although we told them about the table 
property or the Netflix custom spark conf per table, many still wished 
`spark.sql.files.maxPartitionBytes` still applied, when per table settings not 
specified.
   
   Thus I propose honoring this conf after per-table settings and before 
SPLIT_SIZE_DEFAULT. Here is the proposed order:
   
   1. DataFrame read option split-size
   2. (Not in this PR, Netflix internal) Spark conf 
`spark.netflix.(db).(table).target-size` override for a table
   3. table property `read.split.target-size`
   4. Spark conf `spark.sql.files.maxPartitionBytes`
   5. default constant value SPLIT_SIZE_DEFAULT in Iceberg repo. 
   
   Details in 
https://apache-iceberg.slack.com/archives/C03LG1D563F/p1698258652032969


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Spark 3.5: Honor Spark conf spark.sql.files.maxPartitionBytes in read split [iceberg]

Reply via email to