[ https://issues.apache.org/jira/browse/SPARK-34346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17279896#comment-17279896 ]
Apache Spark commented on SPARK-34346: -------------------------------------- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/31492 > io.file.buffer.size set by spark.buffer.size will override by hive-site.xml > may cause perf regression > ----------------------------------------------------------------------------------------------------- > > Key: SPARK-34346 > URL: https://issues.apache.org/jira/browse/SPARK-34346 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 3.1.1 > Reporter: Kent Yao > Priority: Blocker > > In many real-world cases, when interacting with hive catalog through Spark > SQL, users may just share the `hive-site.xml` for their hive jobs and make a > copy to `SPARK_HOME`/conf w/o modification. In Spark, when we generate Hadoop > configurations, we will use `spark.buffer.size(65536)` to reset > `io.file.buffer.size(4096)`. But when we load the hive-site.xml, we may > ignore this behavior and reset `io.file.buffer.size` again according to > `hive-site.xml`. > 1. The configuration priority for setting Hadoop and Hive config here is not > right, while literally, the order should be `spark > spark.hive > > spark.hadoop > hive > hadoop` > 2. This breaks `spark.buffer.size` congfig's behavior for tuning the IO > performance w/ HDFS if there is an existing `io.file.buffer.size` in > hive-site.xml -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org