[ https://issues.apache.org/jira/browse/SPARK-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15005577#comment-15005577 ]
Xin Wu commented on SPARK-10673: -------------------------------- The fix is being tested.. will submit PR shortly. > spark.sql.hive.verifyPartitionPath Attempts to Verify Unregistered Partitions > ----------------------------------------------------------------------------- > > Key: SPARK-10673 > URL: https://issues.apache.org/jira/browse/SPARK-10673 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.4.0, 1.5.0 > Reporter: Miklos Christine > Priority: Minor > > In Spark 1.4, spark.sql.hive.verifyPartitionPath was set to true by default. > In Spark 1.5, it is now set to false by default. > If a table has a lot of partitions in the underlying filesystem, the code > unnecessarily checks for all the underlying directories when executing a > query. > https://github.com/apache/spark/blob/v1.5.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L162 > Structure: > {code} > /user/hive/warehouse/table1/year=2015/month=01/ > /user/hive/warehouse/table1/year=2015/month=02/ > /user/hive/warehouse/table1/year=2015/month=03/ > ... > /user/hive/warehouse/table1/year=2014/month=01/ > /user/hive/warehouse/table1/year=2014/month=02/ > {code} > If the registered partitions only contain year=2015 when you run "show > partitions table1", this code path checks for all directories under the > table's root directory. This incurs a significant performance penalty if > there are a lot of partition directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org