[ https://issues.apache.org/jira/browse/SPARK-22676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16275840#comment-16275840 ]
Apache Spark commented on SPARK-22676: -------------------------------------- User 'jinxing64' has created a pull request for this issue: https://github.com/apache/spark/pull/19868 > Avoid iterating all partition paths when > spark.sql.hive.verifyPartitionPath=true > -------------------------------------------------------------------------------- > > Key: SPARK-22676 > URL: https://issues.apache.org/jira/browse/SPARK-22676 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: jin xing > > In current code, it will scanning all partition paths when > spark.sql.hive.verifyPartitionPath=true. > e.g. table like below: > CREATE TABLE `test`( > `id` int, > `age` int, > `name` string) > PARTITIONED BY ( > `A` string, > `B` string) > load data local inpath '/tmp/data1' into table test partition(A='00', B='00') > load data local inpath '/tmp/data1' into table test partition(A='01', B='01') > load data local inpath '/tmp/data1' into table test partition(A='10', B='10') > load data local inpath '/tmp/data1' into table test partition(A='11', B='11') > If I query with SQL -- "select * from test where year=2017 and month=12 and > day=03", current code will scan all partition paths including > '/data/A=00/B=00', '/data/A=00/B=00', '/data/A=01/B=01', '/data/A=10/B=10', > '/data/A=11/B=11'. It costs much time and memory cost. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org