[ https://issues.apache.org/jira/browse/SPARK-22676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-22676: ------------------------------------ Assignee: (was: Apache Spark) > Avoid iterating all partition paths when > spark.sql.hive.verifyPartitionPath=true > -------------------------------------------------------------------------------- > > Key: SPARK-22676 > URL: https://issues.apache.org/jira/browse/SPARK-22676 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: jin xing > > In current code, it will scanning all partition paths when > spark.sql.hive.verifyPartitionPath=true. > e.g. table like below: > CREATE TABLE `test`( > `id` int, > `age` int, > `name` string) > PARTITIONED BY ( > `A` string, > `B` string) > load data local inpath '/tmp/data1' into table test partition(A='00', B='00') > load data local inpath '/tmp/data1' into table test partition(A='01', B='01') > load data local inpath '/tmp/data1' into table test partition(A='10', B='10') > load data local inpath '/tmp/data1' into table test partition(A='11', B='11') > If I query with SQL -- "select * from test where year=2017 and month=12 and > day=03", current code will scan all partition paths including > '/data/A=00/B=00', '/data/A=00/B=00', '/data/A=01/B=01', '/data/A=10/B=10', > '/data/A=11/B=11'. It costs much time and memory cost. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org