spark git commit: [SPARK-10371][SQL] Implement subexpr elimination for UnsafeProjections

2015-11-10 Thread marmbrus
Repository: spark
Updated Branches:
  refs/heads/master 53600854c -> 87aedc48c


[SPARK-10371][SQL] Implement subexpr elimination for UnsafeProjections

This patch adds the building blocks for codegening subexpr elimination and 
implements
it end to end for UnsafeProjection. The building blocks can be used to do the 
same thing
for other operators.

It introduces some utilities to compute common sub expressions. Expressions can 
be added to
this data structure. The expr and its children will be recursively matched 
against existing
expressions (ones previously added) and grouped into common groups. This is 
built using
the existing `semanticEquals`. It does not understand things like commutative 
or associative
expressions. This can be done as future work.

After building this data structure, the codegen process takes advantage of it 
by:
  1. Generating a helper function in the generated class that computes the 
common
 subexpression. This is done for all common subexpressions that have at 
least
 two occurrences and the expression tree is sufficiently complex.
  2. When generating the apply() function, if the helper function exists, call 
that
 instead of regenerating the expression tree. Repeated calls to the helper 
function
 shortcircuit the evaluation logic.

Author: Nong Li 
Author: Nong Li 

This patch had conflicts when merged, resolved by
Committer: Michael Armbrust 

Closes #9480 from nongli/spark-10371.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/87aedc48
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/87aedc48
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/87aedc48

Branch: refs/heads/master
Commit: 87aedc48c01dffbd880e6ca84076ed47c68f88d0
Parents: 5360085
Author: Nong Li 
Authored: Tue Nov 10 11:28:53 2015 -0800
Committer: Michael Armbrust 
Committed: Tue Nov 10 11:28:53 2015 -0800

--
 .../expressions/EquivalentExpressions.scala | 106 +
 .../sql/catalyst/expressions/Expression.scala   |  50 +-
 .../sql/catalyst/expressions/Projection.scala   |  16 ++
 .../expressions/codegen/CodeGenerator.scala | 110 -
 .../codegen/GenerateUnsafeProjection.scala  |  36 -
 .../catalyst/expressions/namedExpressions.scala |   4 +
 .../SubexpressionEliminationSuite.scala | 153 +++
 .../scala/org/apache/spark/sql/SQLConf.scala|   8 +
 .../apache/spark/sql/execution/SparkPlan.scala  |   5 +
 .../spark/sql/execution/basicOperators.scala|   3 +-
 .../org/apache/spark/sql/SQLQuerySuite.scala|  48 ++
 11 files changed, 523 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/87aedc48/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
new file mode 100644
index 000..e7380d2
--- /dev/null
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+/**
+ * This class is used to compute equality of (sub)expression trees. 
Expressions can be added
+ * to this class and they subsequently query for expression equality. 
Expression trees are
+ * considered equal if for the same input(s), the same result is produced.
+ */
+class EquivalentExpressions {
+  /**
+   * Wrapper around an Expression that provides semantic equality.
+   */
+  case class Expr(e: Expression) {
+val hash = e.semanticHash()
+override def equals(o: Any): Boolean = o match {
+  case other: Expr => 

spark git commit: [SPARK-10371][SQL] Implement subexpr elimination for UnsafeProjections

2015-11-10 Thread marmbrus
Repository: spark
Updated Branches:
  refs/heads/branch-1.6 5ccc1eb08 -> f38509a76


[SPARK-10371][SQL] Implement subexpr elimination for UnsafeProjections

This patch adds the building blocks for codegening subexpr elimination and 
implements
it end to end for UnsafeProjection. The building blocks can be used to do the 
same thing
for other operators.

It introduces some utilities to compute common sub expressions. Expressions can 
be added to
this data structure. The expr and its children will be recursively matched 
against existing
expressions (ones previously added) and grouped into common groups. This is 
built using
the existing `semanticEquals`. It does not understand things like commutative 
or associative
expressions. This can be done as future work.

After building this data structure, the codegen process takes advantage of it 
by:
  1. Generating a helper function in the generated class that computes the 
common
 subexpression. This is done for all common subexpressions that have at 
least
 two occurrences and the expression tree is sufficiently complex.
  2. When generating the apply() function, if the helper function exists, call 
that
 instead of regenerating the expression tree. Repeated calls to the helper 
function
 shortcircuit the evaluation logic.

Author: Nong Li 
Author: Nong Li 

This patch had conflicts when merged, resolved by
Committer: Michael Armbrust 

Closes #9480 from nongli/spark-10371.

(cherry picked from commit 87aedc48c01dffbd880e6ca84076ed47c68f88d0)
Signed-off-by: Michael Armbrust 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f38509a7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f38509a7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f38509a7

Branch: refs/heads/branch-1.6
Commit: f38509a763816f43a224653fe65e4645894c9fc4
Parents: 5ccc1eb
Author: Nong Li 
Authored: Tue Nov 10 11:28:53 2015 -0800
Committer: Michael Armbrust 
Committed: Tue Nov 10 11:29:05 2015 -0800

--
 .../expressions/EquivalentExpressions.scala | 106 +
 .../sql/catalyst/expressions/Expression.scala   |  50 +-
 .../sql/catalyst/expressions/Projection.scala   |  16 ++
 .../expressions/codegen/CodeGenerator.scala | 110 -
 .../codegen/GenerateUnsafeProjection.scala  |  36 -
 .../catalyst/expressions/namedExpressions.scala |   4 +
 .../SubexpressionEliminationSuite.scala | 153 +++
 .../scala/org/apache/spark/sql/SQLConf.scala|   8 +
 .../apache/spark/sql/execution/SparkPlan.scala  |   5 +
 .../spark/sql/execution/basicOperators.scala|   3 +-
 .../org/apache/spark/sql/SQLQuerySuite.scala|  48 ++
 11 files changed, 523 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f38509a7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
new file mode 100644
index 000..e7380d2
--- /dev/null
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import scala.collection.mutable
+
+/**
+ * This class is used to compute equality of (sub)expression trees. 
Expressions can be added
+ * to this class and they subsequently query for expression equality. 
Expression trees are
+ * considered equal if for the same input(s), the same result is produced.
+ */
+class EquivalentExpressions {
+  /**
+   * Wrapper around an Expression that provides semantic equality.
+   */
+  case class