spark git commit: [SPARK-8770][SQL] Create BinaryOperator abstract class.

2015-07-01 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 3a342dedc - 9fd13d561


[SPARK-8770][SQL] Create BinaryOperator abstract class.

Our current BinaryExpression abstract class is not for generic binary 
expressions, i.e. it requires left/right children to have the same type. 
However, due to its name, contributors build new binary expressions that don't 
have that assumption (e.g. Sha) and still extend BinaryExpression.

This patch creates a new BinaryOperator abstract class, and update the analyzer 
o only apply type casting rule there. This patch also adds the notion of 
prettyName to expressions, which defines the user-facing name for the 
expression.

Author: Reynold Xin r...@databricks.com

Closes #7174 from rxin/binary-opterator and squashes the following commits:

f31900d [Reynold Xin] [SPARK-8770][SQL] Create BinaryOperator abstract class.
fceb216 [Reynold Xin] Merge branch 'master' of github.com:apache/spark into 
binary-opterator
d8518cf [Reynold Xin] Updated Python tests.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9fd13d56
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9fd13d56
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9fd13d56

Branch: refs/heads/master
Commit: 9fd13d5613b6d16a78d97d4798f085b56107d343
Parents: 3a342de
Author: Reynold Xin r...@databricks.com
Authored: Wed Jul 1 21:14:13 2015 -0700
Committer: Reynold Xin r...@databricks.com
Committed: Wed Jul 1 21:14:13 2015 -0700

--
 python/pyspark/sql/dataframe.py |  10 +-
 python/pyspark/sql/functions.py |   4 +-
 python/pyspark/sql/group.py |  24 +--
 .../catalyst/analysis/HiveTypeCoercion.scala|  17 +-
 .../expressions/ExpectsInputTypes.scala |  59 +++
 .../sql/catalyst/expressions/Expression.scala   | 161 +--
 .../sql/catalyst/expressions/ScalaUDF.scala |   2 +-
 .../sql/catalyst/expressions/aggregates.scala   |   9 +-
 .../sql/catalyst/expressions/arithmetic.scala   |  14 +-
 .../expressions/complexTypeCreator.scala|   4 +-
 .../catalyst/expressions/nullFunctions.scala|   2 -
 .../sql/catalyst/expressions/predicates.scala   |   6 +-
 .../spark/sql/catalyst/expressions/sets.scala   |   2 -
 .../catalyst/expressions/stringOperations.scala |  26 +--
 .../sql/catalyst/trees/TreeNodeSuite.scala  |   6 +-
 15 files changed, 191 insertions(+), 155 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9fd13d56/python/pyspark/sql/dataframe.py
--
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 273a40d..1e9c657 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -802,11 +802,11 @@ class DataFrame(object):
 Each element should be a column name (string) or an expression 
(:class:`Column`).
 
  df.groupBy().avg().collect()
-[Row(AVG(age)=3.5)]
+[Row(avg(age)=3.5)]
  df.groupBy('name').agg({'age': 'mean'}).collect()
-[Row(name=u'Alice', AVG(age)=2.0), Row(name=u'Bob', AVG(age)=5.0)]
+[Row(name=u'Alice', avg(age)=2.0), Row(name=u'Bob', avg(age)=5.0)]
  df.groupBy(df.name).avg().collect()
-[Row(name=u'Alice', AVG(age)=2.0), Row(name=u'Bob', AVG(age)=5.0)]
+[Row(name=u'Alice', avg(age)=2.0), Row(name=u'Bob', avg(age)=5.0)]
  df.groupBy(['name', df.age]).count().collect()
 [Row(name=u'Bob', age=5, count=1), Row(name=u'Alice', age=2, count=1)]
 
@@ -864,10 +864,10 @@ class DataFrame(object):
 (shorthand for ``df.groupBy.agg()``).
 
  df.agg({age: max}).collect()
-[Row(MAX(age)=5)]
+[Row(max(age)=5)]
  from pyspark.sql import functions as F
  df.agg(F.min(df.age)).collect()
-[Row(MIN(age)=2)]
+[Row(min(age)=2)]
 
 return self.groupBy().agg(*exprs)
 

http://git-wip-us.apache.org/repos/asf/spark/blob/9fd13d56/python/pyspark/sql/functions.py
--
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 4e2be88..f9a15d4 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -266,7 +266,7 @@ def coalesce(*cols):
 
  cDf.select(coalesce(cDf[a], cDf[b])).show()
 +-+
-|Coalesce(a,b)|
+|coalesce(a,b)|
 +-+
 | null|
 |1|
@@ -275,7 +275,7 @@ def coalesce(*cols):
 
  cDf.select('*', coalesce(cDf[a], lit(0.0))).show()
 +++---+
-|   a|   b|Coalesce(a,0.0)|
+|   a|   b|coalesce(a,0.0)|
 +++---+
 |null|null|0.0|
 |   1|null|

spark git commit: [SPARK-8770][SQL] Create BinaryOperator abstract class.

2015-07-01 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master f958f27e2 - 272778999


[SPARK-8770][SQL] Create BinaryOperator abstract class.

Our current BinaryExpression abstract class is not for generic binary 
expressions, i.e. it requires left/right children to have the same type. 
However, due to its name, contributors build new binary expressions that don't 
have that assumption (e.g. Sha) and still extend BinaryExpression.

This patch creates a new BinaryOperator abstract class, and update the analyzer 
o only apply type casting rule there. This patch also adds the notion of 
prettyName to expressions, which defines the user-facing name for the 
expression.

Author: Reynold Xin r...@databricks.com

Closes #7170 from rxin/binaryoperator and squashes the following commits:

51264a5 [Reynold Xin] [SPARK-8770][SQL] Create BinaryOperator abstract class.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/27277899
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/27277899
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/27277899

Branch: refs/heads/master
Commit: 272778999823ed79af92280350c5869a87a21f29
Parents: f958f27
Author: Reynold Xin r...@databricks.com
Authored: Wed Jul 1 16:56:48 2015 -0700
Committer: Reynold Xin r...@databricks.com
Committed: Wed Jul 1 16:56:48 2015 -0700

--
 .../catalyst/analysis/HiveTypeCoercion.scala|  17 +-
 .../expressions/ExpectsInputTypes.scala |  59 +++
 .../sql/catalyst/expressions/Expression.scala   | 161 +--
 .../sql/catalyst/expressions/ScalaUDF.scala |   2 +-
 .../sql/catalyst/expressions/aggregates.scala   |   6 -
 .../sql/catalyst/expressions/arithmetic.scala   |  14 +-
 .../expressions/complexTypeCreator.scala|   4 +-
 .../catalyst/expressions/nullFunctions.scala|   2 -
 .../sql/catalyst/expressions/predicates.scala   |   6 +-
 .../spark/sql/catalyst/expressions/sets.scala   |   2 -
 .../catalyst/expressions/stringOperations.scala |  26 +--
 .../sql/catalyst/trees/TreeNodeSuite.scala  |   6 +-
 12 files changed, 170 insertions(+), 135 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/27277899/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
index 2ab5cb6..8420c54 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
@@ -150,6 +150,7 @@ object HiveTypeCoercion {
* Converts string NaNs that are in binary operators with a NaN-able types 
(Float / Double) to
* the appropriate numeric equivalent.
*/
+  // TODO: remove this rule and make Cast handle Nan.
   object ConvertNaNs extends Rule[LogicalPlan] {
 private val StringNaN = Literal(NaN)
 
@@ -159,19 +160,19 @@ object HiveTypeCoercion {
 case e if !e.childrenResolved = e
 
 /* Double Conversions */
-case b @ BinaryExpression(StringNaN, right @ DoubleType()) =
+case b @ BinaryOperator(StringNaN, right @ DoubleType()) =
   b.makeCopy(Array(Literal(Double.NaN), right))
-case b @ BinaryExpression(left @ DoubleType(), StringNaN) =
+case b @ BinaryOperator(left @ DoubleType(), StringNaN) =
   b.makeCopy(Array(left, Literal(Double.NaN)))
 
 /* Float Conversions */
-case b @ BinaryExpression(StringNaN, right @ FloatType()) =
+case b @ BinaryOperator(StringNaN, right @ FloatType()) =
   b.makeCopy(Array(Literal(Float.NaN), right))
-case b @ BinaryExpression(left @ FloatType(), StringNaN) =
+case b @ BinaryOperator(left @ FloatType(), StringNaN) =
   b.makeCopy(Array(left, Literal(Float.NaN)))
 
 /* Use float NaN by default to avoid unnecessary type widening */
-case b @ BinaryExpression(left @ StringNaN, StringNaN) =
+case b @ BinaryOperator(left @ StringNaN, StringNaN) =
   b.makeCopy(Array(left, Literal(Float.NaN)))
   }
 }
@@ -245,12 +246,12 @@ object HiveTypeCoercion {
 
 Union(newLeft, newRight)
 
-  // Also widen types for BinaryExpressions.
+  // Also widen types for BinaryOperator.
   case q: LogicalPlan = q transformExpressions {
 // Skip nodes who's children have not been resolved yet.
 case e if !e.childrenResolved = e
 
-case b @ BinaryExpression(left, right) if left.dataType != 
right.dataType =
+case b @ BinaryOperator(left, right) if left.dataType !=