subject:"\[GitHub\] spark pull request\: \[SPARK\-9830\] \[SQL\] Remove AggregateExpression1..."

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44464801
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala
 ---
@@ -29,16 +31,13 @@ case class Sum(child: Expression) extends 
DeclarativeAggregate {
   // Return data type.
   override def dataType: DataType = resultType
 
-  // Expected input data type.
-  // TODO: Right now, we replace old aggregate functions (based on 
AggregateExpression1) to the
-  // new version at planning time (after analysis phase). For now, 
NullType is added at here
-  // to make it resolved when we have cases like `select sum(null)`.
-  // We can use our analyzer to cast NullType to the default data type of 
the NumericType once
-  // we remove the old aggregate functions. Then, we will not need 
NullType at here.
   override def inputTypes: Seq[AbstractDataType] =
 Seq(TypeCollection(LongType, DoubleType, DecimalType, NullType))
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44464823
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala
 ---
@@ -55,13 +57,10 @@ abstract class CentralMomentAgg(child: Expression) 
extends ImperativeAggregate w
 
   override def dataType: DataType = DoubleType
 
-  // Expected input data type.
-  // TODO: Right now, we replace old aggregate functions (based on 
AggregateExpression1) to the
-  // new version at planning time (after analysis phase). For now, 
NullType is added at here
-  // to make it resolved when we have cases like `select avg(null)`.
-  // We can use our analyzer to cast NullType to the default data type of 
the NumericType once
-  // we remove the old aggregate functions. Then, we will not need 
NullType at here.
-  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(NumericType, NullType))
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(NumericType))
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44464815
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Stddev.scala
 ---
@@ -48,29 +50,26 @@ abstract class StddevAgg(child: Expression) extends 
DeclarativeAggregate {
 
   override def dataType: DataType = resultType
 
-  // Expected input data type.
-  // TODO: Right now, we replace old aggregate functions (based on 
AggregateExpression1) to the
-  // new version at planning time (after analysis phase). For now, 
NullType is added at here
-  // to make it resolved when we have cases like `select stddev(null)`.
-  // We can use our analyzer to cast NullType to the default data type of 
the NumericType once
-  // we remove the old aggregate functions. Then, we will not need 
NullType at here.
-  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(NumericType, NullType))
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(NumericType))
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44464472
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -448,15 +448,24 @@ private[spark] object SQLConf {
 defaultValue = Some(true),
 isPublic = false)
 
-  val USE_SQL_AGGREGATE2 = booleanConf("spark.sql.useAggregate2",
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44463726
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -120,14 +133,26 @@ trait CheckAnalysis {
   case e => e.children.foreach(checkValidAggregateExpression)
 }
 
-def checkValidGroupingExprs(expr: Expression): Unit = 
expr.dataType match {
-  case BinaryType =>
-failAnalysis(s"binary type expression ${expr.prettyString} 
cannot be used " +
-  "in grouping expression")
-  case m: MapType =>
-failAnalysis(s"map type expression ${expr.prettyString} 
cannot be used " +
-  "in grouping expression")
-  case _ => // OK
+def checkValidGroupingExprs(expr: Expression): Unit = {
+  expr.dataType match {
+case BinaryType =>
+  failAnalysis(s"binary type expression 
${expr.prettyString} cannot be used " +
+"in grouping expression")
+case a: ArrayType =>
+  failAnalysis(s"array type expression 
${expr.prettyString} cannot be used " +
+"in grouping expression")
+case m: MapType =>
+  failAnalysis(s"map type expression ${expr.prettyString} 
cannot be used " +
+"in grouping expression")
+case _ => // OK
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44462628
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -120,14 +133,26 @@ trait CheckAnalysis {
   case e => e.children.foreach(checkValidAggregateExpression)
 }
 
-def checkValidGroupingExprs(expr: Expression): Unit = 
expr.dataType match {
-  case BinaryType =>
-failAnalysis(s"binary type expression ${expr.prettyString} 
cannot be used " +
-  "in grouping expression")
-  case m: MapType =>
-failAnalysis(s"map type expression ${expr.prettyString} 
cannot be used " +
-  "in grouping expression")
-  case _ => // OK
+def checkValidGroupingExprs(expr: Expression): Unit = {
+  expr.dataType match {
+case BinaryType =>
+  failAnalysis(s"binary type expression 
${expr.prettyString} cannot be used " +
+"in grouping expression")
+case a: ArrayType =>
+  failAnalysis(s"array type expression 
${expr.prettyString} cannot be used " +
--- End diff --

Unfortunately, it is the case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44455980
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -108,7 +109,19 @@ trait CheckAnalysis {
 
   case Aggregate(groupingExprs, aggregateExprs, child) =>
 def checkValidAggregateExpression(expr: Expression): Unit = 
expr match {
-  case _: AggregateExpression => // OK
+  case aggExpr: AggregateExpression =>
+// TODO: Is it possible that the child of a agg function 
is another
--- End diff --

seems our error message is not good. I will update this. We should not 
users to have nested agg functions like `sum(avg(...))`. I will add an error 
message to let users to use sub-queries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44454907
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -525,21 +526,14 @@ class Analyzer(
   case u @ UnresolvedFunction(name, children, isDistinct) =>
 withPosition(u) {
   registry.lookupFunction(name, children) match {
-// We get an aggregate function built based on 
AggregateFunction2 interface.
-// So, we wrap it in AggregateExpression2.
-case agg2: AggregateFunction2 => 
AggregateExpression2(agg2, Complete, isDistinct)
-// Currently, our old aggregate function interface 
supports SUM(DISTINCT ...)
-// and COUTN(DISTINCT ...).
-case sumDistinct: SumDistinct => sumDistinct
-case countDistinct: CountDistinct => countDistinct
-// DISTINCT is not meaningful with Max and Min.
-case max: Max if isDistinct => max
-case min: Min if isDistinct => min
-// For other aggregate functions, DISTINCT keyword is not 
supported for now.
-// Once we converted to the new code path, we will allow 
using DISTINCT keyword.
-case other: AggregateExpression1 if isDistinct =>
-  failAnalysis(s"$name does not support DISTINCT keyword.")
-// If it does not have DISTINCT keyword, we will return it 
as is.
+// DISTINCT is not meaningful for a Max or a Min.
+case max: Max if isDistinct =>
+  AggregateExpression(max, Complete, isDistinct = false)
+case min: Min if isDistinct =>
+  AggregateExpression(min, Complete, isDistinct = false)
+// We get an aggregate function, we need to wrap it in an 
AggregateExpression.
+case agg2: AggregateFunction => AggregateExpression(agg2, 
Complete, isDistinct)
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44452030
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Stddev.scala
 ---
@@ -48,29 +50,26 @@ abstract class StddevAgg(child: Expression) extends 
DeclarativeAggregate {
 
   override def dataType: DataType = resultType
 
-  // Expected input data type.
-  // TODO: Right now, we replace old aggregate functions (based on 
AggregateExpression1) to the
-  // new version at planning time (after analysis phase). For now, 
NullType is added at here
-  // to make it resolved when we have cases like `select stddev(null)`.
-  // We can use our analyzer to cast NullType to the default data type of 
the NumericType once
-  // we remove the old aggregate functions. Then, we will not need 
NullType at here.
-  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(NumericType, NullType))
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(NumericType))
--- End diff --

You meant `TypeCollection`? We can remove the `TypeCollection` since we 
only have a single abstract data type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44451909
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala
 ---
@@ -55,13 +57,10 @@ abstract class CentralMomentAgg(child: Expression) 
extends ImperativeAggregate w
 
   override def dataType: DataType = DoubleType
 
-  // Expected input data type.
-  // TODO: Right now, we replace old aggregate functions (based on 
AggregateExpression1) to the
-  // new version at planning time (after analysis phase). For now, 
NullType is added at here
-  // to make it resolved when we have cases like `select avg(null)`.
-  // We can use our analyzer to cast NullType to the default data type of 
the NumericType once
-  // we remove the old aggregate functions. Then, we will not need 
NullType at here.
-  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(NumericType, NullType))
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(NumericType))
--- End diff --

remove TypeCollection?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44451806
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala
 ---
@@ -29,16 +31,13 @@ case class Sum(child: Expression) extends 
DeclarativeAggregate {
   // Return data type.
   override def dataType: DataType = resultType
 
-  // Expected input data type.
-  // TODO: Right now, we replace old aggregate functions (based on 
AggregateExpression1) to the
-  // new version at planning time (after analysis phase). For now, 
NullType is added at here
-  // to make it resolved when we have cases like `select sum(null)`.
-  // We can use our analyzer to cast NullType to the default data type of 
the NumericType once
-  // we remove the old aggregate functions. Then, we will not need 
NullType at here.
   override def inputTypes: Seq[AbstractDataType] =
 Seq(TypeCollection(LongType, DoubleType, DecimalType, NullType))
--- End diff --

yeah, we should remove it. I somehow missed it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44451436
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Stddev.scala
 ---
@@ -48,29 +50,26 @@ abstract class StddevAgg(child: Expression) extends 
DeclarativeAggregate {
 
   override def dataType: DataType = resultType
 
-  // Expected input data type.
-  // TODO: Right now, we replace old aggregate functions (based on 
AggregateExpression1) to the
-  // new version at planning time (after analysis phase). For now, 
NullType is added at here
-  // to make it resolved when we have cases like `select stddev(null)`.
-  // We can use our analyzer to cast NullType to the default data type of 
the NumericType once
-  // we remove the old aggregate functions. Then, we will not need 
NullType at here.
-  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(NumericType, NullType))
+  override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(NumericType))
--- End diff --

Is the NumericType still needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44451117
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala
 ---
@@ -29,16 +31,13 @@ case class Sum(child: Expression) extends 
DeclarativeAggregate {
   // Return data type.
   override def dataType: DataType = resultType
 
-  // Expected input data type.
-  // TODO: Right now, we replace old aggregate functions (based on 
AggregateExpression1) to the
-  // new version at planning time (after analysis phase). For now, 
NullType is added at here
-  // to make it resolved when we have cases like `select sum(null)`.
-  // We can use our analyzer to cast NullType to the default data type of 
the NumericType once
-  // we remove the old aggregate functions. Then, we will not need 
NullType at here.
   override def inputTypes: Seq[AbstractDataType] =
 Seq(TypeCollection(LongType, DoubleType, DecimalType, NullType))
--- End diff --

Should we remove the NullType here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9556


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155533284
  
We should keep reviewing and address any comments in a follow-up, but I'm 
going to merge this now to unblock other work.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r8953
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -448,15 +448,24 @@ private[spark] object SQLConf {
 defaultValue = Some(true),
 isPublic = false)
 
-  val USE_SQL_AGGREGATE2 = booleanConf("spark.sql.useAggregate2",
--- End diff --

This is a public config, should we generate a warning if user try to use 
this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r3426
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -120,14 +133,26 @@ trait CheckAnalysis {
   case e => e.children.foreach(checkValidAggregateExpression)
 }
 
-def checkValidGroupingExprs(expr: Expression): Unit = 
expr.dataType match {
-  case BinaryType =>
-failAnalysis(s"binary type expression ${expr.prettyString} 
cannot be used " +
-  "in grouping expression")
-  case m: MapType =>
-failAnalysis(s"map type expression ${expr.prettyString} 
cannot be used " +
-  "in grouping expression")
-  case _ => // OK
+def checkValidGroupingExprs(expr: Expression): Unit = {
+  expr.dataType match {
+case BinaryType =>
+  failAnalysis(s"binary type expression 
${expr.prettyString} cannot be used " +
+"in grouping expression")
+case a: ArrayType =>
+  failAnalysis(s"array type expression 
${expr.prettyString} cannot be used " +
+"in grouping expression")
+case m: MapType =>
+  failAnalysis(s"map type expression ${expr.prettyString} 
cannot be used " +
+"in grouping expression")
+case _ => // OK
--- End diff --

We should also check UDT and types inside StructType


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r1047
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -120,14 +133,26 @@ trait CheckAnalysis {
   case e => e.children.foreach(checkValidAggregateExpression)
 }
 
-def checkValidGroupingExprs(expr: Expression): Unit = 
expr.dataType match {
-  case BinaryType =>
-failAnalysis(s"binary type expression ${expr.prettyString} 
cannot be used " +
-  "in grouping expression")
-  case m: MapType =>
-failAnalysis(s"map type expression ${expr.prettyString} 
cannot be used " +
-  "in grouping expression")
-  case _ => // OK
+def checkValidGroupingExprs(expr: Expression): Unit = {
+  expr.dataType match {
+case BinaryType =>
+  failAnalysis(s"binary type expression 
${expr.prettyString} cannot be used " +
+"in grouping expression")
+case a: ArrayType =>
+  failAnalysis(s"array type expression 
${expr.prettyString} cannot be used " +
--- End diff --

Is this a regression?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r0841
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -108,7 +109,19 @@ trait CheckAnalysis {
 
   case Aggregate(groupingExprs, aggregateExprs, child) =>
 def checkValidAggregateExpression(expr: Expression): Unit = 
expr match {
-  case _: AggregateExpression => // OK
+  case aggExpr: AggregateExpression =>
+// TODO: Is it possible that the child of a agg function 
is another
--- End diff --

After rewrite for multiple distinct, I think it's possible


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-10 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r0725
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -525,21 +526,14 @@ class Analyzer(
   case u @ UnresolvedFunction(name, children, isDistinct) =>
 withPosition(u) {
   registry.lookupFunction(name, children) match {
-// We get an aggregate function built based on 
AggregateFunction2 interface.
-// So, we wrap it in AggregateExpression2.
-case agg2: AggregateFunction2 => 
AggregateExpression2(agg2, Complete, isDistinct)
-// Currently, our old aggregate function interface 
supports SUM(DISTINCT ...)
-// and COUTN(DISTINCT ...).
-case sumDistinct: SumDistinct => sumDistinct
-case countDistinct: CountDistinct => countDistinct
-// DISTINCT is not meaningful with Max and Min.
-case max: Max if isDistinct => max
-case min: Min if isDistinct => min
-// For other aggregate functions, DISTINCT keyword is not 
supported for now.
-// Once we converted to the new code path, we will allow 
using DISTINCT keyword.
-case other: AggregateExpression1 if isDistinct =>
-  failAnalysis(s"$name does not support DISTINCT keyword.")
-// If it does not have DISTINCT keyword, we will return it 
as is.
+// DISTINCT is not meaningful for a Max or a Min.
+case max: Max if isDistinct =>
+  AggregateExpression(max, Complete, isDistinct = false)
+case min: Min if isDistinct =>
+  AggregateExpression(min, Complete, isDistinct = false)
+// We get an aggregate function, we need to wrap it in an 
AggregateExpression.
+case agg2: AggregateFunction => AggregateExpression(agg2, 
Complete, isDistinct)
--- End diff --

agg2 -> agg


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155348763
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155348765
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45492/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155348607
  
**[Test build #45492 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45492/consoleFull)**
 for PR 9556 at commit 
[`16826e6`](https://github.com/apache/spark/commit/16826e6d5cb9636b6a72125fcccfc4273227c45b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44374094
  
--- Diff: R/pkg/R/functions.R ---
@@ -1339,7 +1339,7 @@ setMethod("pmod", signature(y = "Column"),
 #' @export
 setMethod("approxCountDistinct",
   signature(x = "Column"),
-  function(x, rsd = 0.95) {
+  function(x, rsd = 0.05) {
--- End diff --

Good catch, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155297593
  
**[Test build #45492 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45492/consoleFull)**
 for PR 9556 at commit 
[`16826e6`](https://github.com/apache/spark/commit/16826e6d5cb9636b6a72125fcccfc4273227c45b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155295488
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155295464
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44371174
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -141,6 +141,10 @@ case class UnresolvedFunction(
   override def nullable: Boolean = throw new UnresolvedException(this, 
"nullable")
   override lazy val resolved = false
 
+  override def prettyString: String = {
+s"${name}(${children.map(_.prettyString).mkString(",")})"
--- End diff --

@rxin @marmbrus It is better to change the `prettyString` of 
`UnresolvedFunction` to get rid of that `'` (e.g. `'abs`). The impact of this 
change is that if any user rely on that `'`, his/her code will break.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155290682
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155290683
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45476/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155290543
  
**[Test build #45476 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45476/consoleFull)**
 for PR 9556 at commit 
[`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155286504
  
**[Test build #2027 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2027/consoleFull)**
 for PR 9556 at commit 
[`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155282012
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155282013
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45474/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155281959
  
**[Test build #45474 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45474/consoleFull)**
 for PR 9556 at commit 
[`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44366543
  
--- Diff: R/pkg/R/functions.R ---
@@ -1339,7 +1339,7 @@ setMethod("pmod", signature(y = "Column"),
 #' @export
 setMethod("approxCountDistinct",
   signature(x = "Column"),
-  function(x, rsd = 0.95) {
+  function(x, rsd = 0.05) {
--- End diff --

Yeah looks like it should be 0.05 -- cc @davies 
This seems to have been added back when SparkR was a separate code-base in 
https://github.com/davies/SparkR-pkg/commit/d7b17a428c27aac28d89e1c85f1ba7d9d4b021d2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155265613
  
**[Test build #45476 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45476/consoleFull)**
 for PR 9556 at commit 
[`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155264259
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155264239
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155264172
  
**[Test build #2027 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2027/consoleFull)**
 for PR 9556 at commit 
[`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155263519
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155262404
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155263285
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45471/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155263282
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155263092
  
**[Test build #45474 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45474/consoleFull)**
 for PR 9556 at commit 
[`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44363471
  
--- Diff: R/pkg/R/functions.R ---
@@ -1339,7 +1339,7 @@ setMethod("pmod", signature(y = "Column"),
 #' @export
 setMethod("approxCountDistinct",
   signature(x = "Column"),
-  function(x, rsd = 0.95) {
+  function(x, rsd = 0.05) {
--- End diff --

@shivaram Seems this default value should be 0.05 instead of 0.95, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155262424
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155260995
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155260980
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44361789
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -448,15 +448,17 @@ private[spark] object SQLConf {
 defaultValue = Some(true),
 isPublic = false)
 
-  val USE_SQL_AGGREGATE2 = booleanConf("spark.sql.useAggregate2",
-defaultValue = Some(true), doc = "")
-
   val RUN_SQL_ON_FILES = booleanConf("spark.sql.runSQLOnFiles",
 defaultValue = Some(true),
 isPublic = false,
 doc = "When true, we could use `datasource`.`path` as table in SQL 
query"
   )
 
+  val SPECIALIZE_SINGLE_DISTINCT_AGG_PLANNING =
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155258202
  
**[Test build #2024 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2024/consoleFull)**
 for PR 9556 at commit 
[`302edac`](https://github.com/apache/spark/commit/302edac03296832a72ba944e484974f5f71b68f2).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155257980
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45442/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155257977
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155257913
  
**[Test build #45442 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45442/consoleFull)**
 for PR 9556 at commit 
[`302edac`](https://github.com/apache/spark/commit/302edac03296832a72ba944e484974f5f71b68f2).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44361592
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/GroupedData.scala ---
@@ -88,30 +89,33 @@ class GroupedData protected[sql](
 namedExpr
   }
 }
-toDF(columnExprs.map(f))
+toDF(columnExprs.map(expr => f(expr).toAggregateExpression()))
   }
 
   private[this] def strToExpr(expr: String): (Expression => Expression) = {
-expr.toLowerCase match {
-  case "avg" | "average" | "mean" => Average
-  case "max" => Max
-  case "min" => Min
-  case "stddev" | "std" => StddevSamp
-  case "stddev_pop" => StddevPop
-  case "stddev_samp" => StddevSamp
-  case "variance" => VarianceSamp
-  case "var_pop" => VariancePop
-  case "var_samp" => VarianceSamp
-  case "sum" => Sum
-  case "skewness" => Skewness
-  case "kurtosis" => Kurtosis
-  case "count" | "size" =>
-// Turn count(*) into count(1)
-(inputExpr: Expression) => inputExpr match {
-  case s: Star => Count(Literal(1))
-  case _ => Count(inputExpr)
-}
+val exprToFunc: (Expression => AggregateFunction) = {
+  (inputExpr: Expression) => expr.toLowerCase match {
+case "avg" | "average" | "mean" => Average(inputExpr)
+case "max" => Max(inputExpr)
+case "min" => Min(inputExpr)
+case "stddev" | "std" => StddevSamp(inputExpr)
+case "stddev_pop" => StddevPop(inputExpr)
+case "stddev_samp" => StddevSamp(inputExpr)
+case "variance" => VarianceSamp(inputExpr, 0, 0)
+case "var_pop" => VariancePop(inputExpr, 0, 0)
+case "var_samp" => VarianceSamp(inputExpr, 0, 0)
+case "sum" => Sum(inputExpr)
+case "skewness" => Skewness(inputExpr, 0, 0)
+case "kurtosis" => Kurtosis(inputExpr, 0, 0)
--- End diff --

Done. btw, those offsets will be always set at executor side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44361205
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -53,6 +54,12 @@ object functions {
 
   private def withExpr(expr: Expression): Column = Column(expr)
 
+  private def withAggregateFunction(
+ func: AggregateFunction,
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44361188
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala ---
@@ -141,40 +141,56 @@ class WindowSpec private[sql](
*/
   private[sql] def withAggregate(aggregate: Column): Column = {
 val windowExpr = aggregate.expr match {
-  case Average(child) => WindowExpression(
-UnresolvedWindowFunction("avg", child :: Nil),
-WindowSpecDefinition(partitionSpec, orderSpec, frame))
-  case Sum(child) => WindowExpression(
-UnresolvedWindowFunction("sum", child :: Nil),
-WindowSpecDefinition(partitionSpec, orderSpec, frame))
-  case Count(child) => WindowExpression(
-UnresolvedWindowFunction("count", child :: Nil),
-WindowSpecDefinition(partitionSpec, orderSpec, frame))
-  case First(child, ignoreNulls) => WindowExpression(
-// TODO this is a hack for Hive UDAF first_value
-UnresolvedWindowFunction(
-  "first_value",
-  child :: ignoreNulls :: Nil),
-WindowSpecDefinition(partitionSpec, orderSpec, frame))
-  case Last(child, ignoreNulls) => WindowExpression(
-// TODO this is a hack for Hive UDAF last_value
-UnresolvedWindowFunction(
-  "last_value",
-  child :: ignoreNulls :: Nil),
-WindowSpecDefinition(partitionSpec, orderSpec, frame))
-  case Min(child) => WindowExpression(
-UnresolvedWindowFunction("min", child :: Nil),
-WindowSpecDefinition(partitionSpec, orderSpec, frame))
-  case Max(child) => WindowExpression(
-UnresolvedWindowFunction("max", child :: Nil),
-WindowSpecDefinition(partitionSpec, orderSpec, frame))
-  case wf: WindowFunction => WindowExpression(
-wf,
-WindowSpecDefinition(partitionSpec, orderSpec, frame))
+  // First, we check if we get an aggregate function without the 
DISTINCT keyword.
+  // Right now, we do not support using a DISTINCT aggregate function 
as a
+  // window function.
+  case AggregateExpression(aggregateFunction, _, isDistinct) if 
!isDistinct =>
+aggregateFunction match {
+  case Average(child) => WindowExpression(
+UnresolvedWindowFunction("avg", child :: Nil),
+WindowSpecDefinition(partitionSpec, orderSpec, frame))
+  case Sum(child) => WindowExpression(
+UnresolvedWindowFunction("sum", child :: Nil),
+WindowSpecDefinition(partitionSpec, orderSpec, frame))
+  case Count(child) => WindowExpression(
+UnresolvedWindowFunction("count", child :: Nil),
+WindowSpecDefinition(partitionSpec, orderSpec, frame))
+  case First(child, ignoreNulls) => WindowExpression(
+// TODO this is a hack for Hive UDAF first_value
+UnresolvedWindowFunction(
+  "first_value",
+  child :: ignoreNulls :: Nil),
+WindowSpecDefinition(partitionSpec, orderSpec, frame))
+  case Last(child, ignoreNulls) => WindowExpression(
+// TODO this is a hack for Hive UDAF last_value
+UnresolvedWindowFunction(
+  "last_value",
+  child :: ignoreNulls :: Nil),
+WindowSpecDefinition(partitionSpec, orderSpec, frame))
+  case Min(child) => WindowExpression(
+UnresolvedWindowFunction("min", child :: Nil),
+WindowSpecDefinition(partitionSpec, orderSpec, frame))
+  case Max(child) => WindowExpression(
+UnresolvedWindowFunction("max", child :: Nil),
+WindowSpecDefinition(partitionSpec, orderSpec, frame))
+  case x =>
+throw new UnsupportedOperationException(s"$x is not supported 
in window operation.")
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44361162
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -141,6 +146,12 @@ sealed abstract class AggregateFunction2 extends 
Expression with ImplicitCastInp
 
   override protected def genCode(ctx: CodeGenContext, ev: 
GeneratedExpressionCode): String =
 throw new UnsupportedOperationException(s"Cannot evaluate expression: 
$this")
+
+  def toAggregateExpression(): AggregateExpression = 
toAggregateExpression(isDistinct = false)
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44360927
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Kurtosis.scala
 ---
@@ -24,6 +24,8 @@ case class Kurtosis(child: Expression,
 inputAggBufferOffset: Int = 0)
   extends CentralMomentAgg(child) {
 
+  def this(child: Expression) = this(child, 0, 0)
--- End diff --

Will drop those default args in a separate PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44360853
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Skewness.scala
 ---
@@ -24,6 +24,8 @@ case class Skewness(child: Expression,
 inputAggBufferOffset: Int = 0)
   extends CentralMomentAgg(child) {
 
+  def this(child: Expression) = this(child, 0, 0)
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44358149
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlus.scala
 ---
@@ -55,6 +56,22 @@ case class HyperLogLogPlusPlus(
   extends ImperativeAggregate {
   import HyperLogLogPlusPlus._
 
+  def this(child: Expression) = {
+this(child, 0.05, 0, 0)
+  }
+
+  def this(child: Expression, relativeSD: Expression) = {
+this(
+  child,
+  relativeSD match {
+case Literal(d: Double, DoubleType) => d
+case _ =>
+  throw new AnalysisException("The second argument should be a 
double literal.")
+  },
+  0,
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44357472
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Count.scala
 ---
@@ -32,23 +32,34 @@ case class Count(child: Expression) extends 
DeclarativeAggregate {
   // Expected input data type.
   override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType)
 
-  private val count = AttributeReference("count", LongType)()
+  private lazy val count = AttributeReference("count", LongType)()
 
-  override val aggBufferAttributes = count :: Nil
+  override lazy val aggBufferAttributes = count :: Nil
 
-  override val initialValues = Seq(
+  override lazy val initialValues = Seq(
 /* count = */ Literal(0L)
   )
 
-  override val updateExpressions = Seq(
+  override lazy val updateExpressions = Seq(
 /* count = */ If(IsNull(child), count, count + 1L)
   )
 
-  override val mergeExpressions = Seq(
+  override lazy val mergeExpressions = Seq(
 /* count = */ count.left + count.right
   )
 
-  override val evaluateExpression = Cast(count, LongType)
+  override lazy val evaluateExpression = Cast(count, LongType)
 
   override def defaultResult: Option[Literal] = Option(Literal(0L))
 }
+
+object Count {
+  def apply(children: Seq[Expression]): Count = {
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44355987
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ---
@@ -177,6 +178,7 @@ object FunctionRegistry {
 expression[ToRadians]("radians"),
 
 // aggregate functions
+expression[HyperLogLogPlusPlus]("approx_distinct"),
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44355961
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -108,7 +109,16 @@ trait CheckAnalysis {
 
   case Aggregate(groupingExprs, aggregateExprs, child) =>
 def checkValidAggregateExpression(expr: Expression): Unit = 
expr match {
-  case _: AggregateExpression => // OK
+  case aggExpr: AggregateExpression =>
+// TODO: Is it possible that the child of a agg function 
is another
+// agg function?
+aggExpr.aggregateFunction.children.foreach {
+  case child if !child.deterministic =>
+failAnalysis(
+  s"nondeterministic expression ${expr.prettyString} 
are not allowed " +
+s"in grouping expression.")
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44355776
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -525,21 +526,15 @@ class Analyzer(
   case u @ UnresolvedFunction(name, children, isDistinct) =>
 withPosition(u) {
   registry.lookupFunction(name, children) match {
+// DISTINCT is not meaningful for a Max or a Min.
+case max: Max if isDistinct =>
+  AggregateExpression(max, Complete, isDistinct = false)
+case min: Min if isDistinct =>
+  AggregateExpression(min, Complete, isDistinct = false)
 // We get an aggregate function built based on 
AggregateFunction2 interface.
 // So, we wrap it in AggregateExpression2.
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44355675
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
@@ -272,7 +273,7 @@ object SqlParser extends AbstractSparkSQLParser with 
DataTypeParser {
   protected lazy val function: Parser[Expression] =
 ( ident <~ ("(" ~ "*" ~ ")") ^^ { case udfName =>
   if (lexical.normalizeKeyword(udfName) == "count") {
-Count(Literal(1))
+AggregateExpression(Count(Literal(1)), mode = Complete, isDistinct 
= false)
--- End diff --

Seems we hard code this when we deal with count(*). We can change it in 
another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155238877
  
this is going to conflict at least logically with #9499 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155238823
  
Yay, death to Aggregation1!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44352491
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala ---
@@ -448,15 +448,17 @@ private[spark] object SQLConf {
 defaultValue = Some(true),
 isPublic = false)
 
-  val USE_SQL_AGGREGATE2 = booleanConf("spark.sql.useAggregate2",
-defaultValue = Some(true), doc = "")
-
   val RUN_SQL_ON_FILES = booleanConf("spark.sql.runSQLOnFiles",
 defaultValue = Some(true),
 isPublic = false,
 doc = "When true, we could use `datasource`.`path` as table in SQL 
query"
   )
 
+  val SPECIALIZE_SINGLE_DISTINCT_AGG_PLANNING =
--- End diff --

For posterity maybe comment on what this is and when we think we will 
remove it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44352445
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/GroupedData.scala ---
@@ -88,30 +89,33 @@ class GroupedData protected[sql](
 namedExpr
   }
 }
-toDF(columnExprs.map(f))
+toDF(columnExprs.map(expr => f(expr).toAggregateExpression()))
   }
 
   private[this] def strToExpr(expr: String): (Expression => Expression) = {
-expr.toLowerCase match {
-  case "avg" | "average" | "mean" => Average
-  case "max" => Max
-  case "min" => Min
-  case "stddev" | "std" => StddevSamp
-  case "stddev_pop" => StddevPop
-  case "stddev_samp" => StddevSamp
-  case "variance" => VarianceSamp
-  case "var_pop" => VariancePop
-  case "var_samp" => VarianceSamp
-  case "sum" => Sum
-  case "skewness" => Skewness
-  case "kurtosis" => Kurtosis
-  case "count" | "size" =>
-// Turn count(*) into count(1)
-(inputExpr: Expression) => inputExpr match {
-  case s: Star => Count(Literal(1))
-  case _ => Count(inputExpr)
-}
+val exprToFunc: (Expression => AggregateFunction) = {
+  (inputExpr: Expression) => expr.toLowerCase match {
+case "avg" | "average" | "mean" => Average(inputExpr)
+case "max" => Max(inputExpr)
+case "min" => Min(inputExpr)
+case "stddev" | "std" => StddevSamp(inputExpr)
+case "stddev_pop" => StddevPop(inputExpr)
+case "stddev_samp" => StddevSamp(inputExpr)
+case "variance" => VarianceSamp(inputExpr, 0, 0)
+case "var_pop" => VariancePop(inputExpr, 0, 0)
+case "var_samp" => VarianceSamp(inputExpr, 0, 0)
+case "sum" => Sum(inputExpr)
+case "skewness" => Skewness(inputExpr, 0, 0)
+case "kurtosis" => Kurtosis(inputExpr, 0, 0)
--- End diff --

This is kinda of a nit, but I think it might be better to rely on the 
defaults provided by the extra constructor instead of hard coding these here 
and elsewhere.

Also, is it possible for a user to corrupt query results by specifying 
their own offsets due to the way we resolve functions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44352031
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -53,6 +54,12 @@ object functions {
 
   private def withExpr(expr: Expression): Column = Column(expr)
 
+  private def withAggregateFunction(
+ func: AggregateFunction,
--- End diff --

indent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44352010
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala ---
@@ -141,40 +141,56 @@ class WindowSpec private[sql](
*/
   private[sql] def withAggregate(aggregate: Column): Column = {
 val windowExpr = aggregate.expr match {
-  case Average(child) => WindowExpression(
-UnresolvedWindowFunction("avg", child :: Nil),
-WindowSpecDefinition(partitionSpec, orderSpec, frame))
-  case Sum(child) => WindowExpression(
-UnresolvedWindowFunction("sum", child :: Nil),
-WindowSpecDefinition(partitionSpec, orderSpec, frame))
-  case Count(child) => WindowExpression(
-UnresolvedWindowFunction("count", child :: Nil),
-WindowSpecDefinition(partitionSpec, orderSpec, frame))
-  case First(child, ignoreNulls) => WindowExpression(
-// TODO this is a hack for Hive UDAF first_value
-UnresolvedWindowFunction(
-  "first_value",
-  child :: ignoreNulls :: Nil),
-WindowSpecDefinition(partitionSpec, orderSpec, frame))
-  case Last(child, ignoreNulls) => WindowExpression(
-// TODO this is a hack for Hive UDAF last_value
-UnresolvedWindowFunction(
-  "last_value",
-  child :: ignoreNulls :: Nil),
-WindowSpecDefinition(partitionSpec, orderSpec, frame))
-  case Min(child) => WindowExpression(
-UnresolvedWindowFunction("min", child :: Nil),
-WindowSpecDefinition(partitionSpec, orderSpec, frame))
-  case Max(child) => WindowExpression(
-UnresolvedWindowFunction("max", child :: Nil),
-WindowSpecDefinition(partitionSpec, orderSpec, frame))
-  case wf: WindowFunction => WindowExpression(
-wf,
-WindowSpecDefinition(partitionSpec, orderSpec, frame))
+  // First, we check if we get an aggregate function without the 
DISTINCT keyword.
+  // Right now, we do not support using a DISTINCT aggregate function 
as a
+  // window function.
+  case AggregateExpression(aggregateFunction, _, isDistinct) if 
!isDistinct =>
+aggregateFunction match {
+  case Average(child) => WindowExpression(
+UnresolvedWindowFunction("avg", child :: Nil),
+WindowSpecDefinition(partitionSpec, orderSpec, frame))
+  case Sum(child) => WindowExpression(
+UnresolvedWindowFunction("sum", child :: Nil),
+WindowSpecDefinition(partitionSpec, orderSpec, frame))
+  case Count(child) => WindowExpression(
+UnresolvedWindowFunction("count", child :: Nil),
+WindowSpecDefinition(partitionSpec, orderSpec, frame))
+  case First(child, ignoreNulls) => WindowExpression(
+// TODO this is a hack for Hive UDAF first_value
+UnresolvedWindowFunction(
+  "first_value",
+  child :: ignoreNulls :: Nil),
+WindowSpecDefinition(partitionSpec, orderSpec, frame))
+  case Last(child, ignoreNulls) => WindowExpression(
+// TODO this is a hack for Hive UDAF last_value
+UnresolvedWindowFunction(
+  "last_value",
+  child :: ignoreNulls :: Nil),
+WindowSpecDefinition(partitionSpec, orderSpec, frame))
+  case Min(child) => WindowExpression(
+UnresolvedWindowFunction("min", child :: Nil),
+WindowSpecDefinition(partitionSpec, orderSpec, frame))
+  case Max(child) => WindowExpression(
+UnresolvedWindowFunction("max", child :: Nil),
+WindowSpecDefinition(partitionSpec, orderSpec, frame))
+  case x =>
+throw new UnsupportedOperationException(s"$x is not supported 
in window operation.")
--- End diff --

nit: "in a window operation" or "in window operations"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44351922
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ---
@@ -141,6 +146,12 @@ sealed abstract class AggregateFunction2 extends 
Expression with ImplicitCastInp
 
   override protected def genCode(ctx: CodeGenContext, ev: 
GeneratedExpressionCode): String =
 throw new UnsupportedOperationException(s"Cannot evaluate expression: 
$this")
+
+  def toAggregateExpression(): AggregateExpression = 
toAggregateExpression(isDistinct = false)
--- End diff --

Can you add some scaladoc here about why we need to do this wrapping?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44351704
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Kurtosis.scala
 ---
@@ -24,6 +24,8 @@ case class Kurtosis(child: Expression,
 inputAggBufferOffset: Int = 0)
   extends CentralMomentAgg(child) {
 
+  def this(child: Expression) = this(child, 0, 0)
--- End diff --

Maybe drop the default args?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44351730
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Kurtosis.scala
 ---
@@ -24,6 +24,8 @@ case class Kurtosis(child: Expression,
 inputAggBufferOffset: Int = 0)
   extends CentralMomentAgg(child) {
 
+  def this(child: Expression) = this(child, 0, 0)
--- End diff --

and comment why we can't use them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44351757
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Skewness.scala
 ---
@@ -24,6 +24,8 @@ case class Skewness(child: Expression,
 inputAggBufferOffset: Int = 0)
   extends CentralMomentAgg(child) {
 
+  def this(child: Expression) = this(child, 0, 0)
--- End diff --

same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44351639
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Count.scala
 ---
@@ -32,23 +32,34 @@ case class Count(child: Expression) extends 
DeclarativeAggregate {
   // Expected input data type.
   override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType)
 
-  private val count = AttributeReference("count", LongType)()
+  private lazy val count = AttributeReference("count", LongType)()
 
-  override val aggBufferAttributes = count :: Nil
+  override lazy val aggBufferAttributes = count :: Nil
 
-  override val initialValues = Seq(
+  override lazy val initialValues = Seq(
 /* count = */ Literal(0L)
   )
 
-  override val updateExpressions = Seq(
+  override lazy val updateExpressions = Seq(
 /* count = */ If(IsNull(child), count, count + 1L)
   )
 
-  override val mergeExpressions = Seq(
+  override lazy val mergeExpressions = Seq(
 /* count = */ count.left + count.right
   )
 
-  override val evaluateExpression = Cast(count, LongType)
+  override lazy val evaluateExpression = Cast(count, LongType)
 
   override def defaultResult: Option[Literal] = Option(Literal(0L))
 }
+
+object Count {
+  def apply(children: Seq[Expression]): Count = {
--- End diff --

Maybe document what is going on here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44351651
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlus.scala
 ---
@@ -55,6 +56,22 @@ case class HyperLogLogPlusPlus(
   extends ImperativeAggregate {
   import HyperLogLogPlusPlus._
 
+  def this(child: Expression) = {
+this(child, 0.05, 0, 0)
+  }
+
+  def this(child: Expression, relativeSD: Expression) = {
+this(
+  child,
+  relativeSD match {
+case Literal(d: Double, DoubleType) => d
+case _ =>
+  throw new AnalysisException("The second argument should be a 
double literal.")
+  },
+  0,
--- End diff --

consider named args here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44351475
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 ---
@@ -177,6 +178,7 @@ object FunctionRegistry {
 expression[ToRadians]("radians"),
 
 // aggregate functions
+expression[HyperLogLogPlusPlus]("approx_distinct"),
--- End diff --

approx_count_distinct?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44350995
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -108,7 +109,16 @@ trait CheckAnalysis {
 
   case Aggregate(groupingExprs, aggregateExprs, child) =>
 def checkValidAggregateExpression(expr: Expression): Unit = 
expr match {
-  case _: AggregateExpression => // OK
+  case aggExpr: AggregateExpression =>
+// TODO: Is it possible that the child of a agg function 
is another
+// agg function?
+aggExpr.aggregateFunction.children.foreach {
+  case child if !child.deterministic =>
+failAnalysis(
+  s"nondeterministic expression ${expr.prettyString} 
are not allowed " +
+s"in grouping expression.")
--- End diff --

This error message is wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44350703
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -525,21 +526,15 @@ class Analyzer(
   case u @ UnresolvedFunction(name, children, isDistinct) =>
 withPosition(u) {
   registry.lookupFunction(name, children) match {
+// DISTINCT is not meaningful for a Max or a Min.
+case max: Max if isDistinct =>
+  AggregateExpression(max, Complete, isDistinct = false)
+case min: Min if isDistinct =>
+  AggregateExpression(min, Complete, isDistinct = false)
 // We get an aggregate function built based on 
AggregateFunction2 interface.
 // So, we wrap it in AggregateExpression2.
--- End diff --

This is a little out of date.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44350660
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
@@ -272,7 +273,7 @@ object SqlParser extends AbstractSparkSQLParser with 
DataTypeParser {
   protected lazy val function: Parser[Expression] =
 ( ident <~ ("(" ~ "*" ~ ")") ^^ { case udfName =>
   if (lexical.normalizeKeyword(udfName) == "count") {
-Count(Literal(1))
+AggregateExpression(Count(Literal(1)), mode = Complete, isDistinct 
= false)
--- End diff --

Is there a reason we can't just create unresolved functions here? (other 
than maybe the distinct case?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155232902
  
**[Test build #45442 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45442/consoleFull)**
 for PR 9556 at commit 
[`302edac`](https://github.com/apache/spark/commit/302edac03296832a72ba944e484974f5f71b68f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155232599
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155232567
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155232483
  
**[Test build #2024 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2024/consoleFull)**
 for PR 9556 at commit 
[`302edac`](https://github.com/apache/spark/commit/302edac03296832a72ba944e484974f5f71b68f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155231968
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155231775
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155228594
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155228625
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155226238
  
Yeah. I will change the prettyString of {{AggregateExpression}}. Also, 
since I need to update anyway, I am including my change that introduces the 
flag used by MultipleDistinctRewriter. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread hvanhovell

Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155225000
  
PySpark results are correct, just not in the correct form :(...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread hvanhovell

Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155224746
  
LGTM, pending a successful test build.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155223396
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155223300
  
**[Test build #45407 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45407/consoleFull)**
 for PR 9556 at commit 
[`bad4184`](https://github.com/apache/spark/commit/bad418404e7ba91402279f09a06c21e1930e2b13).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44344541
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -146,148 +146,105 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
 }
   }
 
-  object HashAggregation extends Strategy {
-def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
-  // Aggregations that can be performed in two phases, before and 
after the shuffle.
-  case PartialAggregation(
-  namedGroupingAttributes,
-  rewrittenAggregateExpressions,
-  groupingExpressions,
-  partialComputation,
-  child) if !canBeConvertedToNewAggregation(plan) =>
-execution.Aggregate(
-  partial = false,
-  namedGroupingAttributes,
-  rewrittenAggregateExpressions,
-  execution.Aggregate(
-partial = true,
-groupingExpressions,
-partialComputation,
-planLater(child))) :: Nil
-
-  case _ => Nil
-}
-
-def canBeConvertedToNewAggregation(plan: LogicalPlan): Boolean = plan 
match {
-  case a: logical.Aggregate =>
-if (sqlContext.conf.useSqlAggregate2 && 
sqlContext.conf.codegenEnabled) {
-  a.newAggregation.isDefined
-} else {
-  Utils.checkInvalidAggregateFunction2(a)
-  false
-}
-  case _ => false
-}
-
-def allAggregates(exprs: Seq[Expression]): Seq[AggregateExpression1] =
-  exprs.flatMap(_.collect { case a: AggregateExpression1 => a })
-  }
-
   /**
* Used to plan the aggregate operator for expressions based on the 
AggregateFunction2 interface.
*/
   object Aggregation extends Strategy {
 def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
-  case p: logical.Aggregate if sqlContext.conf.useSqlAggregate2 &&
-  sqlContext.conf.codegenEnabled =>
-val converted = p.newAggregation
-converted match {
-  case None => Nil // Cannot convert to new aggregation code path.
-  case Some(logical.Aggregate(groupingExpressions, 
resultExpressions, child)) =>
-// A single aggregate expression might appear multiple times 
in resultExpressions.
-// In order to avoid evaluating an individual aggregate 
function multiple times, we'll
-// build a set of the distinct aggregate expressions and build 
a function which can
-// be used to re-write expressions so that they reference the 
single copy of the
-// aggregate function which actually gets computed.
-val aggregateExpressions = resultExpressions.flatMap { expr =>
-  expr.collect {
-case agg: AggregateExpression2 => agg
-  }
-}.distinct
-// For those distinct aggregate expressions, we create a map 
from the
-// aggregate function to the corresponding attribute of the 
function.
-val aggregateFunctionToAttribute = aggregateExpressions.map { 
agg =>
-  val aggregateFunction = agg.aggregateFunction
-  val attribute = Alias(aggregateFunction, 
aggregateFunction.toString)().toAttribute
-  (aggregateFunction, agg.isDistinct) -> attribute
-}.toMap
-
-val (functionsWithDistinct, functionsWithoutDistinct) =
-  aggregateExpressions.partition(_.isDistinct)
-if 
(functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
-  // This is a sanity check. We should not reach here when we 
have multiple distinct
-  // column sets (aggregate.NewAggregation will not match).
-  sys.error(
-"Multiple distinct column sets are not supported by the 
new aggregation" +
-  "code path.")
-}
+  case logical.Aggregate(groupingExpressions, resultExpressions, 
child) =>
+// A single aggregate expression might appear multiple times in 
resultExpressions.
+// In order to avoid evaluating an individual aggregate function 
multiple times, we'll
+// build a set of the distinct aggregate expressions and build a 
function which can
+// be used to re-write expressions so that they reference the 
single copy of the
+// aggregate function which actually gets computed.
+val aggregateExpressions = resultExpressions.flatMap { expr =>
+  expr.collect {
+case agg: AggregateExpression => agg
+  }
+}.distinct
+

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44343490
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -146,148 +146,105 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
 }
   }
 
-  object HashAggregation extends Strategy {
-def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
-  // Aggregations that can be performed in two phases, before and 
after the shuffle.
-  case PartialAggregation(
-  namedGroupingAttributes,
-  rewrittenAggregateExpressions,
-  groupingExpressions,
-  partialComputation,
-  child) if !canBeConvertedToNewAggregation(plan) =>
-execution.Aggregate(
-  partial = false,
-  namedGroupingAttributes,
-  rewrittenAggregateExpressions,
-  execution.Aggregate(
-partial = true,
-groupingExpressions,
-partialComputation,
-planLater(child))) :: Nil
-
-  case _ => Nil
-}
-
-def canBeConvertedToNewAggregation(plan: LogicalPlan): Boolean = plan 
match {
-  case a: logical.Aggregate =>
-if (sqlContext.conf.useSqlAggregate2 && 
sqlContext.conf.codegenEnabled) {
-  a.newAggregation.isDefined
-} else {
-  Utils.checkInvalidAggregateFunction2(a)
-  false
-}
-  case _ => false
-}
-
-def allAggregates(exprs: Seq[Expression]): Seq[AggregateExpression1] =
-  exprs.flatMap(_.collect { case a: AggregateExpression1 => a })
-  }
-
   /**
* Used to plan the aggregate operator for expressions based on the 
AggregateFunction2 interface.
*/
   object Aggregation extends Strategy {
 def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
-  case p: logical.Aggregate if sqlContext.conf.useSqlAggregate2 &&
-  sqlContext.conf.codegenEnabled =>
-val converted = p.newAggregation
-converted match {
-  case None => Nil // Cannot convert to new aggregation code path.
-  case Some(logical.Aggregate(groupingExpressions, 
resultExpressions, child)) =>
-// A single aggregate expression might appear multiple times 
in resultExpressions.
-// In order to avoid evaluating an individual aggregate 
function multiple times, we'll
-// build a set of the distinct aggregate expressions and build 
a function which can
-// be used to re-write expressions so that they reference the 
single copy of the
-// aggregate function which actually gets computed.
-val aggregateExpressions = resultExpressions.flatMap { expr =>
-  expr.collect {
-case agg: AggregateExpression2 => agg
-  }
-}.distinct
-// For those distinct aggregate expressions, we create a map 
from the
-// aggregate function to the corresponding attribute of the 
function.
-val aggregateFunctionToAttribute = aggregateExpressions.map { 
agg =>
-  val aggregateFunction = agg.aggregateFunction
-  val attribute = Alias(aggregateFunction, 
aggregateFunction.toString)().toAttribute
-  (aggregateFunction, agg.isDistinct) -> attribute
-}.toMap
-
-val (functionsWithDistinct, functionsWithoutDistinct) =
-  aggregateExpressions.partition(_.isDistinct)
-if 
(functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
-  // This is a sanity check. We should not reach here when we 
have multiple distinct
-  // column sets (aggregate.NewAggregation will not match).
-  sys.error(
-"Multiple distinct column sets are not supported by the 
new aggregation" +
-  "code path.")
-}
+  case logical.Aggregate(groupingExpressions, resultExpressions, 
child) =>
+// A single aggregate expression might appear multiple times in 
resultExpressions.
+// In order to avoid evaluating an individual aggregate function 
multiple times, we'll
+// build a set of the distinct aggregate expressions and build a 
function which can
+// be used to re-write expressions so that they reference the 
single copy of the
+// aggregate function which actually gets computed.
+val aggregateExpressions = resultExpressions.flatMap { expr =>
+  expr.collect {
+case agg: AggregateExpression => agg
+  }
+}.distinct

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9556#discussion_r44327164
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -146,148 +146,105 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
 }
   }
 
-  object HashAggregation extends Strategy {
-def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
-  // Aggregations that can be performed in two phases, before and 
after the shuffle.
-  case PartialAggregation(
-  namedGroupingAttributes,
-  rewrittenAggregateExpressions,
-  groupingExpressions,
-  partialComputation,
-  child) if !canBeConvertedToNewAggregation(plan) =>
-execution.Aggregate(
-  partial = false,
-  namedGroupingAttributes,
-  rewrittenAggregateExpressions,
-  execution.Aggregate(
-partial = true,
-groupingExpressions,
-partialComputation,
-planLater(child))) :: Nil
-
-  case _ => Nil
-}
-
-def canBeConvertedToNewAggregation(plan: LogicalPlan): Boolean = plan 
match {
-  case a: logical.Aggregate =>
-if (sqlContext.conf.useSqlAggregate2 && 
sqlContext.conf.codegenEnabled) {
-  a.newAggregation.isDefined
-} else {
-  Utils.checkInvalidAggregateFunction2(a)
-  false
-}
-  case _ => false
-}
-
-def allAggregates(exprs: Seq[Expression]): Seq[AggregateExpression1] =
-  exprs.flatMap(_.collect { case a: AggregateExpression1 => a })
-  }
-
   /**
* Used to plan the aggregate operator for expressions based on the 
AggregateFunction2 interface.
*/
   object Aggregation extends Strategy {
 def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
-  case p: logical.Aggregate if sqlContext.conf.useSqlAggregate2 &&
-  sqlContext.conf.codegenEnabled =>
-val converted = p.newAggregation
-converted match {
-  case None => Nil // Cannot convert to new aggregation code path.
-  case Some(logical.Aggregate(groupingExpressions, 
resultExpressions, child)) =>
-// A single aggregate expression might appear multiple times 
in resultExpressions.
-// In order to avoid evaluating an individual aggregate 
function multiple times, we'll
-// build a set of the distinct aggregate expressions and build 
a function which can
-// be used to re-write expressions so that they reference the 
single copy of the
-// aggregate function which actually gets computed.
-val aggregateExpressions = resultExpressions.flatMap { expr =>
-  expr.collect {
-case agg: AggregateExpression2 => agg
-  }
-}.distinct
-// For those distinct aggregate expressions, we create a map 
from the
-// aggregate function to the corresponding attribute of the 
function.
-val aggregateFunctionToAttribute = aggregateExpressions.map { 
agg =>
-  val aggregateFunction = agg.aggregateFunction
-  val attribute = Alias(aggregateFunction, 
aggregateFunction.toString)().toAttribute
-  (aggregateFunction, agg.isDistinct) -> attribute
-}.toMap
-
-val (functionsWithDistinct, functionsWithoutDistinct) =
-  aggregateExpressions.partition(_.isDistinct)
-if 
(functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) {
-  // This is a sanity check. We should not reach here when we 
have multiple distinct
-  // column sets (aggregate.NewAggregation will not match).
-  sys.error(
-"Multiple distinct column sets are not supported by the 
new aggregation" +
-  "code path.")
-}
+  case logical.Aggregate(groupingExpressions, resultExpressions, 
child) =>
+// A single aggregate expression might appear multiple times in 
resultExpressions.
+// In order to avoid evaluating an individual aggregate function 
multiple times, we'll
+// build a set of the distinct aggregate expressions and build a 
function which can
+// be used to re-write expressions so that they reference the 
single copy of the
+// aggregate function which actually gets computed.
+val aggregateExpressions = resultExpressions.flatMap { expr =>
+  expr.collect {
+case agg: AggregateExpression => agg
+  }
+}.distinct
+

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155182875
  
**[Test build #45407 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45407/consoleFull)**
 for PR 9556 at commit 
[`bad4184`](https://github.com/apache/spark/commit/bad418404e7ba91402279f09a06c21e1930e2b13).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...

2015-11-09 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9556#issuecomment-155179711
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 143 matches

Mail list logo