jzhuge commented on PR #8922: URL: https://github.com/apache/iceberg/pull/8922#issuecomment-1784547459
@rdblue When we migrate users from Hive tables to Iceberg tables, some jobs hit executor OOM, as they were tuned for Hive tables with `spark.sql.files.maxPartitionBytes`. Although we told them about the table property or the Netflix custom spark conf per table, many still wished `spark.sql.files.maxPartitionBytes` still applied, when per table settings not specified. Thus I propose honoring this conf after per-table settings and before SPLIT_SIZE_DEFAULT. Here is the proposed order: 1. DataFrame read option split-size 2. (Not in this PR, Netflix internal) Spark conf `spark.netflix.(db).(table).target-size` override for a table 3. table property `read.split.target-size` 4. Spark conf `spark.sql.files.maxPartitionBytes` 5. default constant value SPLIT_SIZE_DEFAULT in Iceberg repo. Details in https://apache-iceberg.slack.com/archives/C03LG1D563F/p1698258652032969 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org