[ https://issues.apache.org/jira/browse/SPARK-25925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16678861#comment-16678861 ]
Adam Budde commented on SPARK-25925: ------------------------------------ [~axenol] I would definitely support making the documentation clearer in this instance. > Spark 2.3.1 retrieves all partitions from Hive Metastore by default > ------------------------------------------------------------------- > > Key: SPARK-25925 > URL: https://issues.apache.org/jira/browse/SPARK-25925 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.1 > Reporter: Alex Ivanov > Priority: Major > > Spark 2.3.1 comes with the following _spark-defaults.conf_ parameters by > default: > {code:java} > spark.sql.hive.convertMetastoreParquet true > spark.sql.hive.metastorePartitionPruning true > spark.sql.hive.caseSensitiveInferenceMode INFER_AND_SAVE{code} > While the first two properties are fine, the last one has an unfortunate > side-effect. I realize it's set to INFER_AND_SAVE for a reason, namely > https://issues.apache.org/jira/browse/SPARK-19611, however that also causes > an issue. > The problem is at this point: > [https://github.com/apache/spark/blob/a2f502cf53b6b00af7cb80b6f38e64cf46367595/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L232] > The inference causes all partitions to be retrieved for the table from Hive > Metastore. This is a problem because even running *explain* on a simple query > on a table with thousands of partitions seems to hang, and is very difficult > to debug. > Moreover, many people will address the issue by changing: > {code:java} > spark.sql.hive.convertMetastoreParquet false{code} > see that it works, and call it a day, thereby forgoing the benefits of using > Parquet support in Spark directly. In our experience, this causes significant > slow-downs on at least some queries. > This Jira is mostly to document the issue, even if it cannot be addressed, so > that people who inevitably run into this behavior can see the resolution, > which is changing the parameter to *NEVER_INFER*, provided there are no > issues with Parquet-Hive schema compatibility, i.e. all of the schema is in > lower-case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org