jin xing created SPARK-22676: -------------------------------- Summary: Avoid iterating all partition paths when spark.sql.hive.verifyPartitionPath=true Key: SPARK-22676 URL: https://issues.apache.org/jira/browse/SPARK-22676 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Reporter: jin xing
In current code, it will scanning all partition paths when spark.sql.hive.verifyPartitionPath=true. e.g. table like below: CREATE TABLE `test`( `id` int, `age` int, `name` string) PARTITIONED BY ( `A` string, `B` string) load data local inpath '/tmp/data1' into table test partition(A='00', B='00') load data local inpath '/tmp/data1' into table test partition(A='01', B='01') load data local inpath '/tmp/data1' into table test partition(A='10', B='10') load data local inpath '/tmp/data1' into table test partition(A='11', B='11') If I query with SQL -- "select * from test where year=2017 and month=12 and day=03", current code will scan all partition paths including '/data/A=00/B=00', '/data/A=00/B=00', '/data/A=01/B=01', '/data/A=10/B=10', '/data/A=11/B=11'. It costs much time and memory cost. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org