spark git commit: [SPARK-11371] Make "mean" an alias for "avg" operator

yhuai Mon, 02 Nov 2015 13:52:26 -0800

Repository: spark
Updated Branches:
  refs/heads/master 33ae7a35d -> db11ee5e5



[SPARK-11371] Make "mean" an alias for "avg" operator

>From Reynold in the thread 'Exception when using some aggregate operators' 
>(http://search-hadoop.com/m/q3RTt0xFr22nXB4/):

I don't think these are bugs. The SQL standard for average is "avg", not 
"mean". Similarly, a distinct count is supposed to be written as 
"count(distinct col)", not "countDistinct(col)".
We can, however, make "mean" an alias for "avg" to improve compatibility 
between DataFrame and SQL.

Author: tedyu <yuzhih...@gmail.com>

Closes #9332 from ted-yu/master.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/db11ee5e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/db11ee5e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/db11ee5e

Branch: refs/heads/master
Commit: db11ee5e56e5fac59895c772a9a87c5ac86888ef
Parents: 33ae7a3
Author: tedyu <yuzhih...@gmail.com>
Authored: Mon Nov 2 13:51:53 2015 -0800
Committer: Yin Huai <yh...@databricks.com>
Committed: Mon Nov 2 13:51:53 2015 -0800

----------------------------------------------------------------------
 .../spark/sql/catalyst/analysis/FunctionRegistry.scala      | 1 +
 .../spark/sql/hive/execution/AggregationQuerySuite.scala    | 9 +++++++++
 2 files changed, 10 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/db11ee5e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
----------------------------------------------------------------------
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index 5f3ec74..24c1a7b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -185,6 +185,7 @@ object FunctionRegistry {
     expression[Last]("last"),
     expression[Last]("last_value"),
     expression[Max]("max"),
+    expression[Average]("mean"),
     expression[Min]("min"),
     expression[Stddev]("stddev"),
     expression[StddevPop]("stddev_pop"),

http://git-wip-us.apache.org/repos/asf/spark/blob/db11ee5e/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
----------------------------------------------------------------------
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
index 0cf0e0a..74061db 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala
@@ -301,6 +301,15 @@ abstract class AggregationQuerySuite extends QueryTest 
with SQLTestUtils with Te
     checkAnswer(
       sqlContext.sql(
         """
+          |SELECT key, mean(value)
+          |FROM agg1
+          |GROUP BY key
+        """.stripMargin),
+      Row(1, 20.0) :: Row(2, -0.5) :: Row(3, null) :: Row(null, 10.0) :: Nil)
+
+    checkAnswer(
+      sqlContext.sql(
+        """
           |SELECT avg(value), key
           |FROM agg1
           |GROUP BY key


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11371] Make "mean" an alias for "avg" operator

Reply via email to