AngersZhuuuu commented on pull request #28805: URL: https://github.com/apache/spark/pull/28805#issuecomment-646445716
> The CNF process should break down `dt = 20190626 and id in (1,2,3)` to `Seq((dt = 20190626), (id in (1,2,3))`, and then these two sub-predicates will be processed in `groupExpressionsByQualifier`. What is the problem here? In current partition pruning, ScanOperation get predicates by `splitConjunctivePredicates` , if there is ```(dt = 1 or (dt = 2 and id = 3))```, it won't be seperated, then since this expression is reference contains (id, dt), it won't be pushed down as a partition predicates. Then it will scan all data in the partition table. ``` object HiveTableScans extends Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { case ScanOperation(projectList, predicates, relation: HiveTableRelation) => // Filter out all predicates that only deal with partition keys, these are given to the // hive table scan operator to be used for partition pruning. val partitionKeyIds = AttributeSet(relation.partitionCols) val (pruningPredicates, otherPredicates) = predicates.partition { predicate => !predicate.references.isEmpty && predicate.references.subsetOf(partitionKeyIds) } pruneFilterProject( projectList, otherPredicates, identity[Seq[Expression]], HiveTableScanExec(_, relation, pruningPredicates)(sparkSession)) :: Nil case _ => Nil } } ``` With convert to CNF, ```(dt = 1 or (dt = 2 and id = 3))``` will be converted to ```(dt = 1 or dt = 2) and (dt = 1 or id = 3))``` then this expression can be split by `splitConjunctivePredicates` and split to two expression ```(dt = 1 or dt = 2)``` and ``` (dt = 1 or id = 3))```, then ```(dt = 1 or dt = 2)``` can be pushed down as partition pruning predicates. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org