spark git commit: [SPARK-13473][SQL] Simplifies PushPredicateThroughProject

lian Tue, 22 Mar 2016 04:21:46 -0700

Repository: spark
Updated Branches:
  refs/heads/master 14464cadb -> f2e855fba



[SPARK-13473][SQL] Simplifies PushPredicateThroughProject

## What changes were proposed in this pull request?

This is a follow-up of PR #11348.

After PR #11348, a predicate is never pushed through a project as long as the 
project contains any non-deterministic fields. Thus, it's impossible that the 
candidate filter condition can reference any non-deterministic projected 
fields, and related logic can be safely cleaned up.

To be more specific, the following optimization is allowed:

```scala
// From:
df.select('a, 'b).filter('c > rand(42))
// To:
df.filter('c > rand(42)).select('a, 'b)
```

while this isn't:

```scala
// From:
df.select('a, rand('b) as 'rb, 'c).filter('c > 'rb)
// To:
df.filter('c > rand('b)).select('a, rand('b) as 'rb, 'c)
```

## How was this patch tested?

Existing test cases should do the work.

Author: Cheng Lian <l...@databricks.com>

Closes #11864 from liancheng/spark-13473-cleanup.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f2e855fb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f2e855fb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f2e855fb

Branch: refs/heads/master
Commit: f2e855fba8eb73475cf312cdf880c1297d4323bb
Parents: 14464ca
Author: Cheng Lian <l...@databricks.com>
Authored: Tue Mar 22 19:20:56 2016 +0800
Committer: Cheng Lian <l...@databricks.com>
Committed: Tue Mar 22 19:20:56 2016 +0800

----------------------------------------------------------------------
 .../sql/catalyst/optimizer/Optimizer.scala      | 24 +-------------------
 1 file changed, 1 insertion(+), 23 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/f2e855fb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
----------------------------------------------------------------------
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 41e8dc0..0840d46 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -891,29 +891,7 @@ object PushPredicateThroughProject extends 
Rule[LogicalPlan] with PredicateHelpe
         case a: Alias => (a.toAttribute, a.child)
       })
 
-      // Split the condition into small conditions by `And`, so that we can 
push down part of this
-      // condition without nondeterministic expressions.
-      val andConditions = splitConjunctivePredicates(condition)
-
-      val (deterministic, nondeterministic) = 
andConditions.partition(_.collect {
-        case a: Attribute if aliasMap.contains(a) => aliasMap(a)
-      }.forall(_.deterministic))
-
-      // If there is no nondeterministic conditions, push down the whole 
condition.
-      if (nondeterministic.isEmpty) {
-        project.copy(child = Filter(replaceAlias(condition, aliasMap), 
grandChild))
-      } else {
-        // If they are all nondeterministic conditions, leave it un-changed.
-        if (deterministic.isEmpty) {
-          filter
-        } else {
-          // Push down the small conditions without nondeterministic 
expressions.
-          val pushedCondition =
-            deterministic.map(replaceAlias(_, aliasMap)).reduce(And)
-          Filter(nondeterministic.reduce(And),
-            project.copy(child = Filter(pushedCondition, grandChild)))
-        }
-      }
+      project.copy(child = Filter(replaceAlias(condition, aliasMap), 
grandChild))
   }
 
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-13473][SQL] Simplifies PushPredicateThroughProject

Reply via email to