[ https://issues.apache.org/jira/browse/SPARK-25548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-25548: ------------------------------------ Assignee: Apache Spark > In the PruneFileSourcePartitions optimizer, replace the nonPartitionOps field > with true in the And(partitionOps, nonPartitionOps) to make the partition can > be pruned > --------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-25548 > URL: https://issues.apache.org/jira/browse/SPARK-25548 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.2 > Reporter: eaton > Assignee: Apache Spark > Priority: Critical > > In the PruneFileSourcePartitions optimizer, the partition files will not be > pruned if we use partition filter and non partition filter together, for > example: > sql("CREATE TABLE IF NOT EXISTS src_par (key INT, value STRING) partitioned > by(p_d int) stored as parquet ") > sql("insert overwrite table src_par partition(p_d=2) select 2 as key, '4' as > value") > sql("insert overwrite table src_par partition(p_d=3) select 3 as key, '4' as > value") > sql("insert overwrite table src_par partition(p_d=4) select 4 as key, '4' as > value") > The sql below will scan all the partition files, in which, the partition > **p_d=4** should be pruned. > **sql("select * from src_par where (p_d=2 and key=2) or (p_d=3 and > key=3)").show** -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org