[GitHub] spark pull request #20988: [SPARK-23877][SQL]: Use filter predicates to prun...

cloud-fan Fri, 13 Apr 2018 17:48:07 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20988#discussion_r181535484
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
 ---
    @@ -129,35 +151,41 @@ case class OptimizeMetadataOnlyQuery(catalog: 
SessionCatalog) extends Rule[Logic
     
       /**
        * A pattern that finds the partitioned table relation node inside the 
given plan, and returns a
    -   * pair of the partition attributes and the table relation node.
    +   * pair of the partition attributes, partition filters, and the table 
relation node.
        *
        * It keeps traversing down the given plan tree if there is a 
[[Project]] or [[Filter]] with
        * deterministic expressions, and returns result after reaching the 
partitioned table relation
        * node.
        */
    -  object PartitionedRelation {
    -
    -    def unapply(plan: LogicalPlan): Option[(AttributeSet, LogicalPlan)] = 
plan match {
    -      case l @ LogicalRelation(fsRelation: HadoopFsRelation, _, _, _)
    -        if fsRelation.partitionSchema.nonEmpty =>
    -        val partAttrs = 
getPartitionAttrs(fsRelation.partitionSchema.map(_.name), l)
    -        Some((AttributeSet(partAttrs), l))
    -
    -      case relation: HiveTableRelation if 
relation.tableMeta.partitionColumnNames.nonEmpty =>
    -        val partAttrs = 
getPartitionAttrs(relation.tableMeta.partitionColumnNames, relation)
    -        Some((AttributeSet(partAttrs), relation))
    -
    -      case p @ Project(projectList, child) if 
projectList.forall(_.deterministic) =>
    -        unapply(child).flatMap { case (partAttrs, relation) =>
    -          if (p.references.subsetOf(partAttrs)) Some((p.outputSet, 
relation)) else None
    -        }
    +  object PartitionedRelation extends PredicateHelper {
    +
    +    def unapply(plan: LogicalPlan): Option[(AttributeSet, Seq[Expression], 
LogicalPlan)] = {
    +      plan match {
    +        case l @ LogicalRelation(fsRelation: HadoopFsRelation, _, _, _)
    +          if fsRelation.partitionSchema.nonEmpty =>
    +          val partAttrs = 
getPartitionAttrs(fsRelation.partitionSchema.map(_.name), l)
    +          Some((AttributeSet(partAttrs), Nil, l))
    +
    +        case relation: HiveTableRelation if 
relation.tableMeta.partitionColumnNames.nonEmpty =>
    +          val partAttrs = 
getPartitionAttrs(relation.tableMeta.partitionColumnNames, relation)
    +          Some((AttributeSet(partAttrs), Nil, relation))
    +
    +        case p @ Project(projectList, child) if 
projectList.forall(_.deterministic) =>
    +          unapply(child).flatMap { case (partAttrs, filters, relation) =>
    +            if (p.references.subsetOf(partAttrs)) Some((p.outputSet, 
filters, relation)) else None
    --- End diff --
    
    what about `Filter(p > 1, Project(a, p, Table(a, b, p, partitioned by 
p)))`? `p > 1` should also be a partition filter.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20988: [SPARK-23877][SQL]: Use filter predicates to prun...

Reply via email to