[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44464801 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala --- @@ -29,16 +31,13 @@ case class Sum(child: Expression) extends DeclarativeAggregate { // Return data type. override def dataType: DataType = resultType - // Expected input data type. - // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the - // new version at planning time (after analysis phase). For now, NullType is added at here - // to make it resolved when we have cases like `select sum(null)`. - // We can use our analyzer to cast NullType to the default data type of the NumericType once - // we remove the old aggregate functions. Then, we will not need NullType at here. override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(LongType, DoubleType, DecimalType, NullType)) --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44464823 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala --- @@ -55,13 +57,10 @@ abstract class CentralMomentAgg(child: Expression) extends ImperativeAggregate w override def dataType: DataType = DoubleType - // Expected input data type. - // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the - // new version at planning time (after analysis phase). For now, NullType is added at here - // to make it resolved when we have cases like `select avg(null)`. - // We can use our analyzer to cast NullType to the default data type of the NumericType once - // we remove the old aggregate functions. Then, we will not need NullType at here. - override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType)) + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType)) --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44464815 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Stddev.scala --- @@ -48,29 +50,26 @@ abstract class StddevAgg(child: Expression) extends DeclarativeAggregate { override def dataType: DataType = resultType - // Expected input data type. - // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the - // new version at planning time (after analysis phase). For now, NullType is added at here - // to make it resolved when we have cases like `select stddev(null)`. - // We can use our analyzer to cast NullType to the default data type of the NumericType once - // we remove the old aggregate functions. Then, we will not need NullType at here. - override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType)) + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType)) --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44464472 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -448,15 +448,24 @@ private[spark] object SQLConf { defaultValue = Some(true), isPublic = false) - val USE_SQL_AGGREGATE2 = booleanConf("spark.sql.useAggregate2", --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44463726 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -120,14 +133,26 @@ trait CheckAnalysis { case e => e.children.foreach(checkValidAggregateExpression) } -def checkValidGroupingExprs(expr: Expression): Unit = expr.dataType match { - case BinaryType => -failAnalysis(s"binary type expression ${expr.prettyString} cannot be used " + - "in grouping expression") - case m: MapType => -failAnalysis(s"map type expression ${expr.prettyString} cannot be used " + - "in grouping expression") - case _ => // OK +def checkValidGroupingExprs(expr: Expression): Unit = { + expr.dataType match { +case BinaryType => + failAnalysis(s"binary type expression ${expr.prettyString} cannot be used " + +"in grouping expression") +case a: ArrayType => + failAnalysis(s"array type expression ${expr.prettyString} cannot be used " + +"in grouping expression") +case m: MapType => + failAnalysis(s"map type expression ${expr.prettyString} cannot be used " + +"in grouping expression") +case _ => // OK --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44462628 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -120,14 +133,26 @@ trait CheckAnalysis { case e => e.children.foreach(checkValidAggregateExpression) } -def checkValidGroupingExprs(expr: Expression): Unit = expr.dataType match { - case BinaryType => -failAnalysis(s"binary type expression ${expr.prettyString} cannot be used " + - "in grouping expression") - case m: MapType => -failAnalysis(s"map type expression ${expr.prettyString} cannot be used " + - "in grouping expression") - case _ => // OK +def checkValidGroupingExprs(expr: Expression): Unit = { + expr.dataType match { +case BinaryType => + failAnalysis(s"binary type expression ${expr.prettyString} cannot be used " + +"in grouping expression") +case a: ArrayType => + failAnalysis(s"array type expression ${expr.prettyString} cannot be used " + --- End diff -- Unfortunately, it is the case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44455980 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -108,7 +109,19 @@ trait CheckAnalysis { case Aggregate(groupingExprs, aggregateExprs, child) => def checkValidAggregateExpression(expr: Expression): Unit = expr match { - case _: AggregateExpression => // OK + case aggExpr: AggregateExpression => +// TODO: Is it possible that the child of a agg function is another --- End diff -- seems our error message is not good. I will update this. We should not users to have nested agg functions like `sum(avg(...))`. I will add an error message to let users to use sub-queries. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44454907 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -525,21 +526,14 @@ class Analyzer( case u @ UnresolvedFunction(name, children, isDistinct) => withPosition(u) { registry.lookupFunction(name, children) match { -// We get an aggregate function built based on AggregateFunction2 interface. -// So, we wrap it in AggregateExpression2. -case agg2: AggregateFunction2 => AggregateExpression2(agg2, Complete, isDistinct) -// Currently, our old aggregate function interface supports SUM(DISTINCT ...) -// and COUTN(DISTINCT ...). -case sumDistinct: SumDistinct => sumDistinct -case countDistinct: CountDistinct => countDistinct -// DISTINCT is not meaningful with Max and Min. -case max: Max if isDistinct => max -case min: Min if isDistinct => min -// For other aggregate functions, DISTINCT keyword is not supported for now. -// Once we converted to the new code path, we will allow using DISTINCT keyword. -case other: AggregateExpression1 if isDistinct => - failAnalysis(s"$name does not support DISTINCT keyword.") -// If it does not have DISTINCT keyword, we will return it as is. +// DISTINCT is not meaningful for a Max or a Min. +case max: Max if isDistinct => + AggregateExpression(max, Complete, isDistinct = false) +case min: Min if isDistinct => + AggregateExpression(min, Complete, isDistinct = false) +// We get an aggregate function, we need to wrap it in an AggregateExpression. +case agg2: AggregateFunction => AggregateExpression(agg2, Complete, isDistinct) --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44452030 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Stddev.scala --- @@ -48,29 +50,26 @@ abstract class StddevAgg(child: Expression) extends DeclarativeAggregate { override def dataType: DataType = resultType - // Expected input data type. - // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the - // new version at planning time (after analysis phase). For now, NullType is added at here - // to make it resolved when we have cases like `select stddev(null)`. - // We can use our analyzer to cast NullType to the default data type of the NumericType once - // we remove the old aggregate functions. Then, we will not need NullType at here. - override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType)) + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType)) --- End diff -- You meant `TypeCollection`? We can remove the `TypeCollection` since we only have a single abstract data type. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44451909 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CentralMomentAgg.scala --- @@ -55,13 +57,10 @@ abstract class CentralMomentAgg(child: Expression) extends ImperativeAggregate w override def dataType: DataType = DoubleType - // Expected input data type. - // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the - // new version at planning time (after analysis phase). For now, NullType is added at here - // to make it resolved when we have cases like `select avg(null)`. - // We can use our analyzer to cast NullType to the default data type of the NumericType once - // we remove the old aggregate functions. Then, we will not need NullType at here. - override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType)) + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType)) --- End diff -- remove TypeCollection? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44451806 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala --- @@ -29,16 +31,13 @@ case class Sum(child: Expression) extends DeclarativeAggregate { // Return data type. override def dataType: DataType = resultType - // Expected input data type. - // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the - // new version at planning time (after analysis phase). For now, NullType is added at here - // to make it resolved when we have cases like `select sum(null)`. - // We can use our analyzer to cast NullType to the default data type of the NumericType once - // we remove the old aggregate functions. Then, we will not need NullType at here. override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(LongType, DoubleType, DecimalType, NullType)) --- End diff -- yeah, we should remove it. I somehow missed it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44451436 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Stddev.scala --- @@ -48,29 +50,26 @@ abstract class StddevAgg(child: Expression) extends DeclarativeAggregate { override def dataType: DataType = resultType - // Expected input data type. - // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the - // new version at planning time (after analysis phase). For now, NullType is added at here - // to make it resolved when we have cases like `select stddev(null)`. - // We can use our analyzer to cast NullType to the default data type of the NumericType once - // we remove the old aggregate functions. Then, we will not need NullType at here. - override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType, NullType)) + override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(NumericType)) --- End diff -- Is the NumericType still needed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44451117 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala --- @@ -29,16 +31,13 @@ case class Sum(child: Expression) extends DeclarativeAggregate { // Return data type. override def dataType: DataType = resultType - // Expected input data type. - // TODO: Right now, we replace old aggregate functions (based on AggregateExpression1) to the - // new version at planning time (after analysis phase). For now, NullType is added at here - // to make it resolved when we have cases like `select sum(null)`. - // We can use our analyzer to cast NullType to the default data type of the NumericType once - // we remove the old aggregate functions. Then, we will not need NullType at here. override def inputTypes: Seq[AbstractDataType] = Seq(TypeCollection(LongType, DoubleType, DecimalType, NullType)) --- End diff -- Should we remove the NullType here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9556 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155533284 We should keep reviewing and address any comments in a follow-up, but I'm going to merge this now to unblock other work. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r8953 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -448,15 +448,24 @@ private[spark] object SQLConf { defaultValue = Some(true), isPublic = false) - val USE_SQL_AGGREGATE2 = booleanConf("spark.sql.useAggregate2", --- End diff -- This is a public config, should we generate a warning if user try to use this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r3426 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -120,14 +133,26 @@ trait CheckAnalysis { case e => e.children.foreach(checkValidAggregateExpression) } -def checkValidGroupingExprs(expr: Expression): Unit = expr.dataType match { - case BinaryType => -failAnalysis(s"binary type expression ${expr.prettyString} cannot be used " + - "in grouping expression") - case m: MapType => -failAnalysis(s"map type expression ${expr.prettyString} cannot be used " + - "in grouping expression") - case _ => // OK +def checkValidGroupingExprs(expr: Expression): Unit = { + expr.dataType match { +case BinaryType => + failAnalysis(s"binary type expression ${expr.prettyString} cannot be used " + +"in grouping expression") +case a: ArrayType => + failAnalysis(s"array type expression ${expr.prettyString} cannot be used " + +"in grouping expression") +case m: MapType => + failAnalysis(s"map type expression ${expr.prettyString} cannot be used " + +"in grouping expression") +case _ => // OK --- End diff -- We should also check UDT and types inside StructType --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r1047 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -120,14 +133,26 @@ trait CheckAnalysis { case e => e.children.foreach(checkValidAggregateExpression) } -def checkValidGroupingExprs(expr: Expression): Unit = expr.dataType match { - case BinaryType => -failAnalysis(s"binary type expression ${expr.prettyString} cannot be used " + - "in grouping expression") - case m: MapType => -failAnalysis(s"map type expression ${expr.prettyString} cannot be used " + - "in grouping expression") - case _ => // OK +def checkValidGroupingExprs(expr: Expression): Unit = { + expr.dataType match { +case BinaryType => + failAnalysis(s"binary type expression ${expr.prettyString} cannot be used " + +"in grouping expression") +case a: ArrayType => + failAnalysis(s"array type expression ${expr.prettyString} cannot be used " + --- End diff -- Is this a regression? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r0841 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -108,7 +109,19 @@ trait CheckAnalysis { case Aggregate(groupingExprs, aggregateExprs, child) => def checkValidAggregateExpression(expr: Expression): Unit = expr match { - case _: AggregateExpression => // OK + case aggExpr: AggregateExpression => +// TODO: Is it possible that the child of a agg function is another --- End diff -- After rewrite for multiple distinct, I think it's possible --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r0725 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -525,21 +526,14 @@ class Analyzer( case u @ UnresolvedFunction(name, children, isDistinct) => withPosition(u) { registry.lookupFunction(name, children) match { -// We get an aggregate function built based on AggregateFunction2 interface. -// So, we wrap it in AggregateExpression2. -case agg2: AggregateFunction2 => AggregateExpression2(agg2, Complete, isDistinct) -// Currently, our old aggregate function interface supports SUM(DISTINCT ...) -// and COUTN(DISTINCT ...). -case sumDistinct: SumDistinct => sumDistinct -case countDistinct: CountDistinct => countDistinct -// DISTINCT is not meaningful with Max and Min. -case max: Max if isDistinct => max -case min: Min if isDistinct => min -// For other aggregate functions, DISTINCT keyword is not supported for now. -// Once we converted to the new code path, we will allow using DISTINCT keyword. -case other: AggregateExpression1 if isDistinct => - failAnalysis(s"$name does not support DISTINCT keyword.") -// If it does not have DISTINCT keyword, we will return it as is. +// DISTINCT is not meaningful for a Max or a Min. +case max: Max if isDistinct => + AggregateExpression(max, Complete, isDistinct = false) +case min: Min if isDistinct => + AggregateExpression(min, Complete, isDistinct = false) +// We get an aggregate function, we need to wrap it in an AggregateExpression. +case agg2: AggregateFunction => AggregateExpression(agg2, Complete, isDistinct) --- End diff -- agg2 -> agg --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155348763 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155348765 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45492/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155348607 **[Test build #45492 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45492/consoleFull)** for PR 9556 at commit [`16826e6`](https://github.com/apache/spark/commit/16826e6d5cb9636b6a72125fcccfc4273227c45b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44374094 --- Diff: R/pkg/R/functions.R --- @@ -1339,7 +1339,7 @@ setMethod("pmod", signature(y = "Column"), #' @export setMethod("approxCountDistinct", signature(x = "Column"), - function(x, rsd = 0.95) { + function(x, rsd = 0.05) { --- End diff -- Good catch, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155297593 **[Test build #45492 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45492/consoleFull)** for PR 9556 at commit [`16826e6`](https://github.com/apache/spark/commit/16826e6d5cb9636b6a72125fcccfc4273227c45b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155295488 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155295464 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44371174 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala --- @@ -141,6 +141,10 @@ case class UnresolvedFunction( override def nullable: Boolean = throw new UnresolvedException(this, "nullable") override lazy val resolved = false + override def prettyString: String = { +s"${name}(${children.map(_.prettyString).mkString(",")})" --- End diff -- @rxin @marmbrus It is better to change the `prettyString` of `UnresolvedFunction` to get rid of that `'` (e.g. `'abs`). The impact of this change is that if any user rely on that `'`, his/her code will break. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155290682 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155290683 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45476/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155290543 **[Test build #45476 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45476/consoleFull)** for PR 9556 at commit [`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155286504 **[Test build #2027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2027/consoleFull)** for PR 9556 at commit [`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155282012 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155282013 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45474/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155281959 **[Test build #45474 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45474/consoleFull)** for PR 9556 at commit [`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44366543 --- Diff: R/pkg/R/functions.R --- @@ -1339,7 +1339,7 @@ setMethod("pmod", signature(y = "Column"), #' @export setMethod("approxCountDistinct", signature(x = "Column"), - function(x, rsd = 0.95) { + function(x, rsd = 0.05) { --- End diff -- Yeah looks like it should be 0.05 -- cc @davies This seems to have been added back when SparkR was a separate code-base in https://github.com/davies/SparkR-pkg/commit/d7b17a428c27aac28d89e1c85f1ba7d9d4b021d2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155265613 **[Test build #45476 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45476/consoleFull)** for PR 9556 at commit [`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155264259 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155264239 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155264172 **[Test build #2027 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2027/consoleFull)** for PR 9556 at commit [`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155263519 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155262404 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155263285 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45471/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155263282 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155263092 **[Test build #45474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45474/consoleFull)** for PR 9556 at commit [`fb64896`](https://github.com/apache/spark/commit/fb648964300425203f9c7ef556f03604488dab15). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44363471 --- Diff: R/pkg/R/functions.R --- @@ -1339,7 +1339,7 @@ setMethod("pmod", signature(y = "Column"), #' @export setMethod("approxCountDistinct", signature(x = "Column"), - function(x, rsd = 0.95) { + function(x, rsd = 0.05) { --- End diff -- @shivaram Seems this default value should be 0.05 instead of 0.95, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155262424 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155260995 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155260980 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44361789 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -448,15 +448,17 @@ private[spark] object SQLConf { defaultValue = Some(true), isPublic = false) - val USE_SQL_AGGREGATE2 = booleanConf("spark.sql.useAggregate2", -defaultValue = Some(true), doc = "") - val RUN_SQL_ON_FILES = booleanConf("spark.sql.runSQLOnFiles", defaultValue = Some(true), isPublic = false, doc = "When true, we could use `datasource`.`path` as table in SQL query" ) + val SPECIALIZE_SINGLE_DISTINCT_AGG_PLANNING = --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155258202 **[Test build #2024 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2024/consoleFull)** for PR 9556 at commit [`302edac`](https://github.com/apache/spark/commit/302edac03296832a72ba944e484974f5f71b68f2). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155257980 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45442/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155257977 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155257913 **[Test build #45442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45442/consoleFull)** for PR 9556 at commit [`302edac`](https://github.com/apache/spark/commit/302edac03296832a72ba944e484974f5f71b68f2). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44361592 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/GroupedData.scala --- @@ -88,30 +89,33 @@ class GroupedData protected[sql]( namedExpr } } -toDF(columnExprs.map(f)) +toDF(columnExprs.map(expr => f(expr).toAggregateExpression())) } private[this] def strToExpr(expr: String): (Expression => Expression) = { -expr.toLowerCase match { - case "avg" | "average" | "mean" => Average - case "max" => Max - case "min" => Min - case "stddev" | "std" => StddevSamp - case "stddev_pop" => StddevPop - case "stddev_samp" => StddevSamp - case "variance" => VarianceSamp - case "var_pop" => VariancePop - case "var_samp" => VarianceSamp - case "sum" => Sum - case "skewness" => Skewness - case "kurtosis" => Kurtosis - case "count" | "size" => -// Turn count(*) into count(1) -(inputExpr: Expression) => inputExpr match { - case s: Star => Count(Literal(1)) - case _ => Count(inputExpr) -} +val exprToFunc: (Expression => AggregateFunction) = { + (inputExpr: Expression) => expr.toLowerCase match { +case "avg" | "average" | "mean" => Average(inputExpr) +case "max" => Max(inputExpr) +case "min" => Min(inputExpr) +case "stddev" | "std" => StddevSamp(inputExpr) +case "stddev_pop" => StddevPop(inputExpr) +case "stddev_samp" => StddevSamp(inputExpr) +case "variance" => VarianceSamp(inputExpr, 0, 0) +case "var_pop" => VariancePop(inputExpr, 0, 0) +case "var_samp" => VarianceSamp(inputExpr, 0, 0) +case "sum" => Sum(inputExpr) +case "skewness" => Skewness(inputExpr, 0, 0) +case "kurtosis" => Kurtosis(inputExpr, 0, 0) --- End diff -- Done. btw, those offsets will be always set at executor side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44361205 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -53,6 +54,12 @@ object functions { private def withExpr(expr: Expression): Column = Column(expr) + private def withAggregateFunction( + func: AggregateFunction, --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44361188 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -141,40 +141,56 @@ class WindowSpec private[sql]( */ private[sql] def withAggregate(aggregate: Column): Column = { val windowExpr = aggregate.expr match { - case Average(child) => WindowExpression( -UnresolvedWindowFunction("avg", child :: Nil), -WindowSpecDefinition(partitionSpec, orderSpec, frame)) - case Sum(child) => WindowExpression( -UnresolvedWindowFunction("sum", child :: Nil), -WindowSpecDefinition(partitionSpec, orderSpec, frame)) - case Count(child) => WindowExpression( -UnresolvedWindowFunction("count", child :: Nil), -WindowSpecDefinition(partitionSpec, orderSpec, frame)) - case First(child, ignoreNulls) => WindowExpression( -// TODO this is a hack for Hive UDAF first_value -UnresolvedWindowFunction( - "first_value", - child :: ignoreNulls :: Nil), -WindowSpecDefinition(partitionSpec, orderSpec, frame)) - case Last(child, ignoreNulls) => WindowExpression( -// TODO this is a hack for Hive UDAF last_value -UnresolvedWindowFunction( - "last_value", - child :: ignoreNulls :: Nil), -WindowSpecDefinition(partitionSpec, orderSpec, frame)) - case Min(child) => WindowExpression( -UnresolvedWindowFunction("min", child :: Nil), -WindowSpecDefinition(partitionSpec, orderSpec, frame)) - case Max(child) => WindowExpression( -UnresolvedWindowFunction("max", child :: Nil), -WindowSpecDefinition(partitionSpec, orderSpec, frame)) - case wf: WindowFunction => WindowExpression( -wf, -WindowSpecDefinition(partitionSpec, orderSpec, frame)) + // First, we check if we get an aggregate function without the DISTINCT keyword. + // Right now, we do not support using a DISTINCT aggregate function as a + // window function. + case AggregateExpression(aggregateFunction, _, isDistinct) if !isDistinct => +aggregateFunction match { + case Average(child) => WindowExpression( +UnresolvedWindowFunction("avg", child :: Nil), +WindowSpecDefinition(partitionSpec, orderSpec, frame)) + case Sum(child) => WindowExpression( +UnresolvedWindowFunction("sum", child :: Nil), +WindowSpecDefinition(partitionSpec, orderSpec, frame)) + case Count(child) => WindowExpression( +UnresolvedWindowFunction("count", child :: Nil), +WindowSpecDefinition(partitionSpec, orderSpec, frame)) + case First(child, ignoreNulls) => WindowExpression( +// TODO this is a hack for Hive UDAF first_value +UnresolvedWindowFunction( + "first_value", + child :: ignoreNulls :: Nil), +WindowSpecDefinition(partitionSpec, orderSpec, frame)) + case Last(child, ignoreNulls) => WindowExpression( +// TODO this is a hack for Hive UDAF last_value +UnresolvedWindowFunction( + "last_value", + child :: ignoreNulls :: Nil), +WindowSpecDefinition(partitionSpec, orderSpec, frame)) + case Min(child) => WindowExpression( +UnresolvedWindowFunction("min", child :: Nil), +WindowSpecDefinition(partitionSpec, orderSpec, frame)) + case Max(child) => WindowExpression( +UnresolvedWindowFunction("max", child :: Nil), +WindowSpecDefinition(partitionSpec, orderSpec, frame)) + case x => +throw new UnsupportedOperationException(s"$x is not supported in window operation.") --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44361162 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala --- @@ -141,6 +146,12 @@ sealed abstract class AggregateFunction2 extends Expression with ImplicitCastInp override protected def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = throw new UnsupportedOperationException(s"Cannot evaluate expression: $this") + + def toAggregateExpression(): AggregateExpression = toAggregateExpression(isDistinct = false) --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44360927 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Kurtosis.scala --- @@ -24,6 +24,8 @@ case class Kurtosis(child: Expression, inputAggBufferOffset: Int = 0) extends CentralMomentAgg(child) { + def this(child: Expression) = this(child, 0, 0) --- End diff -- Will drop those default args in a separate PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44360853 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Skewness.scala --- @@ -24,6 +24,8 @@ case class Skewness(child: Expression, inputAggBufferOffset: Int = 0) extends CentralMomentAgg(child) { + def this(child: Expression) = this(child, 0, 0) --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44358149 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlus.scala --- @@ -55,6 +56,22 @@ case class HyperLogLogPlusPlus( extends ImperativeAggregate { import HyperLogLogPlusPlus._ + def this(child: Expression) = { +this(child, 0.05, 0, 0) + } + + def this(child: Expression, relativeSD: Expression) = { +this( + child, + relativeSD match { +case Literal(d: Double, DoubleType) => d +case _ => + throw new AnalysisException("The second argument should be a double literal.") + }, + 0, --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44357472 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Count.scala --- @@ -32,23 +32,34 @@ case class Count(child: Expression) extends DeclarativeAggregate { // Expected input data type. override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType) - private val count = AttributeReference("count", LongType)() + private lazy val count = AttributeReference("count", LongType)() - override val aggBufferAttributes = count :: Nil + override lazy val aggBufferAttributes = count :: Nil - override val initialValues = Seq( + override lazy val initialValues = Seq( /* count = */ Literal(0L) ) - override val updateExpressions = Seq( + override lazy val updateExpressions = Seq( /* count = */ If(IsNull(child), count, count + 1L) ) - override val mergeExpressions = Seq( + override lazy val mergeExpressions = Seq( /* count = */ count.left + count.right ) - override val evaluateExpression = Cast(count, LongType) + override lazy val evaluateExpression = Cast(count, LongType) override def defaultResult: Option[Literal] = Option(Literal(0L)) } + +object Count { + def apply(children: Seq[Expression]): Count = { --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44355987 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -177,6 +178,7 @@ object FunctionRegistry { expression[ToRadians]("radians"), // aggregate functions +expression[HyperLogLogPlusPlus]("approx_distinct"), --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44355961 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -108,7 +109,16 @@ trait CheckAnalysis { case Aggregate(groupingExprs, aggregateExprs, child) => def checkValidAggregateExpression(expr: Expression): Unit = expr match { - case _: AggregateExpression => // OK + case aggExpr: AggregateExpression => +// TODO: Is it possible that the child of a agg function is another +// agg function? +aggExpr.aggregateFunction.children.foreach { + case child if !child.deterministic => +failAnalysis( + s"nondeterministic expression ${expr.prettyString} are not allowed " + +s"in grouping expression.") --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44355776 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -525,21 +526,15 @@ class Analyzer( case u @ UnresolvedFunction(name, children, isDistinct) => withPosition(u) { registry.lookupFunction(name, children) match { +// DISTINCT is not meaningful for a Max or a Min. +case max: Max if isDistinct => + AggregateExpression(max, Complete, isDistinct = false) +case min: Min if isDistinct => + AggregateExpression(min, Complete, isDistinct = false) // We get an aggregate function built based on AggregateFunction2 interface. // So, we wrap it in AggregateExpression2. --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44355675 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -272,7 +273,7 @@ object SqlParser extends AbstractSparkSQLParser with DataTypeParser { protected lazy val function: Parser[Expression] = ( ident <~ ("(" ~ "*" ~ ")") ^^ { case udfName => if (lexical.normalizeKeyword(udfName) == "count") { -Count(Literal(1)) +AggregateExpression(Count(Literal(1)), mode = Complete, isDistinct = false) --- End diff -- Seems we hard code this when we deal with count(*). We can change it in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155238877 this is going to conflict at least logically with #9499 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155238823 Yay, death to Aggregation1! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44352491 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala --- @@ -448,15 +448,17 @@ private[spark] object SQLConf { defaultValue = Some(true), isPublic = false) - val USE_SQL_AGGREGATE2 = booleanConf("spark.sql.useAggregate2", -defaultValue = Some(true), doc = "") - val RUN_SQL_ON_FILES = booleanConf("spark.sql.runSQLOnFiles", defaultValue = Some(true), isPublic = false, doc = "When true, we could use `datasource`.`path` as table in SQL query" ) + val SPECIALIZE_SINGLE_DISTINCT_AGG_PLANNING = --- End diff -- For posterity maybe comment on what this is and when we think we will remove it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44352445 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/GroupedData.scala --- @@ -88,30 +89,33 @@ class GroupedData protected[sql]( namedExpr } } -toDF(columnExprs.map(f)) +toDF(columnExprs.map(expr => f(expr).toAggregateExpression())) } private[this] def strToExpr(expr: String): (Expression => Expression) = { -expr.toLowerCase match { - case "avg" | "average" | "mean" => Average - case "max" => Max - case "min" => Min - case "stddev" | "std" => StddevSamp - case "stddev_pop" => StddevPop - case "stddev_samp" => StddevSamp - case "variance" => VarianceSamp - case "var_pop" => VariancePop - case "var_samp" => VarianceSamp - case "sum" => Sum - case "skewness" => Skewness - case "kurtosis" => Kurtosis - case "count" | "size" => -// Turn count(*) into count(1) -(inputExpr: Expression) => inputExpr match { - case s: Star => Count(Literal(1)) - case _ => Count(inputExpr) -} +val exprToFunc: (Expression => AggregateFunction) = { + (inputExpr: Expression) => expr.toLowerCase match { +case "avg" | "average" | "mean" => Average(inputExpr) +case "max" => Max(inputExpr) +case "min" => Min(inputExpr) +case "stddev" | "std" => StddevSamp(inputExpr) +case "stddev_pop" => StddevPop(inputExpr) +case "stddev_samp" => StddevSamp(inputExpr) +case "variance" => VarianceSamp(inputExpr, 0, 0) +case "var_pop" => VariancePop(inputExpr, 0, 0) +case "var_samp" => VarianceSamp(inputExpr, 0, 0) +case "sum" => Sum(inputExpr) +case "skewness" => Skewness(inputExpr, 0, 0) +case "kurtosis" => Kurtosis(inputExpr, 0, 0) --- End diff -- This is kinda of a nit, but I think it might be better to rely on the defaults provided by the extra constructor instead of hard coding these here and elsewhere. Also, is it possible for a user to corrupt query results by specifying their own offsets due to the way we resolve functions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44352031 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -53,6 +54,12 @@ object functions { private def withExpr(expr: Expression): Column = Column(expr) + private def withAggregateFunction( + func: AggregateFunction, --- End diff -- indent --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44352010 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala --- @@ -141,40 +141,56 @@ class WindowSpec private[sql]( */ private[sql] def withAggregate(aggregate: Column): Column = { val windowExpr = aggregate.expr match { - case Average(child) => WindowExpression( -UnresolvedWindowFunction("avg", child :: Nil), -WindowSpecDefinition(partitionSpec, orderSpec, frame)) - case Sum(child) => WindowExpression( -UnresolvedWindowFunction("sum", child :: Nil), -WindowSpecDefinition(partitionSpec, orderSpec, frame)) - case Count(child) => WindowExpression( -UnresolvedWindowFunction("count", child :: Nil), -WindowSpecDefinition(partitionSpec, orderSpec, frame)) - case First(child, ignoreNulls) => WindowExpression( -// TODO this is a hack for Hive UDAF first_value -UnresolvedWindowFunction( - "first_value", - child :: ignoreNulls :: Nil), -WindowSpecDefinition(partitionSpec, orderSpec, frame)) - case Last(child, ignoreNulls) => WindowExpression( -// TODO this is a hack for Hive UDAF last_value -UnresolvedWindowFunction( - "last_value", - child :: ignoreNulls :: Nil), -WindowSpecDefinition(partitionSpec, orderSpec, frame)) - case Min(child) => WindowExpression( -UnresolvedWindowFunction("min", child :: Nil), -WindowSpecDefinition(partitionSpec, orderSpec, frame)) - case Max(child) => WindowExpression( -UnresolvedWindowFunction("max", child :: Nil), -WindowSpecDefinition(partitionSpec, orderSpec, frame)) - case wf: WindowFunction => WindowExpression( -wf, -WindowSpecDefinition(partitionSpec, orderSpec, frame)) + // First, we check if we get an aggregate function without the DISTINCT keyword. + // Right now, we do not support using a DISTINCT aggregate function as a + // window function. + case AggregateExpression(aggregateFunction, _, isDistinct) if !isDistinct => +aggregateFunction match { + case Average(child) => WindowExpression( +UnresolvedWindowFunction("avg", child :: Nil), +WindowSpecDefinition(partitionSpec, orderSpec, frame)) + case Sum(child) => WindowExpression( +UnresolvedWindowFunction("sum", child :: Nil), +WindowSpecDefinition(partitionSpec, orderSpec, frame)) + case Count(child) => WindowExpression( +UnresolvedWindowFunction("count", child :: Nil), +WindowSpecDefinition(partitionSpec, orderSpec, frame)) + case First(child, ignoreNulls) => WindowExpression( +// TODO this is a hack for Hive UDAF first_value +UnresolvedWindowFunction( + "first_value", + child :: ignoreNulls :: Nil), +WindowSpecDefinition(partitionSpec, orderSpec, frame)) + case Last(child, ignoreNulls) => WindowExpression( +// TODO this is a hack for Hive UDAF last_value +UnresolvedWindowFunction( + "last_value", + child :: ignoreNulls :: Nil), +WindowSpecDefinition(partitionSpec, orderSpec, frame)) + case Min(child) => WindowExpression( +UnresolvedWindowFunction("min", child :: Nil), +WindowSpecDefinition(partitionSpec, orderSpec, frame)) + case Max(child) => WindowExpression( +UnresolvedWindowFunction("max", child :: Nil), +WindowSpecDefinition(partitionSpec, orderSpec, frame)) + case x => +throw new UnsupportedOperationException(s"$x is not supported in window operation.") --- End diff -- nit: "in a window operation" or "in window operations" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44351922 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala --- @@ -141,6 +146,12 @@ sealed abstract class AggregateFunction2 extends Expression with ImplicitCastInp override protected def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): String = throw new UnsupportedOperationException(s"Cannot evaluate expression: $this") + + def toAggregateExpression(): AggregateExpression = toAggregateExpression(isDistinct = false) --- End diff -- Can you add some scaladoc here about why we need to do this wrapping? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44351704 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Kurtosis.scala --- @@ -24,6 +24,8 @@ case class Kurtosis(child: Expression, inputAggBufferOffset: Int = 0) extends CentralMomentAgg(child) { + def this(child: Expression) = this(child, 0, 0) --- End diff -- Maybe drop the default args? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44351730 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Kurtosis.scala --- @@ -24,6 +24,8 @@ case class Kurtosis(child: Expression, inputAggBufferOffset: Int = 0) extends CentralMomentAgg(child) { + def this(child: Expression) = this(child, 0, 0) --- End diff -- and comment why we can't use them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44351757 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Skewness.scala --- @@ -24,6 +24,8 @@ case class Skewness(child: Expression, inputAggBufferOffset: Int = 0) extends CentralMomentAgg(child) { + def this(child: Expression) = this(child, 0, 0) --- End diff -- same. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44351639 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Count.scala --- @@ -32,23 +32,34 @@ case class Count(child: Expression) extends DeclarativeAggregate { // Expected input data type. override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType) - private val count = AttributeReference("count", LongType)() + private lazy val count = AttributeReference("count", LongType)() - override val aggBufferAttributes = count :: Nil + override lazy val aggBufferAttributes = count :: Nil - override val initialValues = Seq( + override lazy val initialValues = Seq( /* count = */ Literal(0L) ) - override val updateExpressions = Seq( + override lazy val updateExpressions = Seq( /* count = */ If(IsNull(child), count, count + 1L) ) - override val mergeExpressions = Seq( + override lazy val mergeExpressions = Seq( /* count = */ count.left + count.right ) - override val evaluateExpression = Cast(count, LongType) + override lazy val evaluateExpression = Cast(count, LongType) override def defaultResult: Option[Literal] = Option(Literal(0L)) } + +object Count { + def apply(children: Seq[Expression]): Count = { --- End diff -- Maybe document what is going on here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44351651 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlus.scala --- @@ -55,6 +56,22 @@ case class HyperLogLogPlusPlus( extends ImperativeAggregate { import HyperLogLogPlusPlus._ + def this(child: Expression) = { +this(child, 0.05, 0, 0) + } + + def this(child: Expression, relativeSD: Expression) = { +this( + child, + relativeSD match { +case Literal(d: Double, DoubleType) => d +case _ => + throw new AnalysisException("The second argument should be a double literal.") + }, + 0, --- End diff -- consider named args here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44351475 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala --- @@ -177,6 +178,7 @@ object FunctionRegistry { expression[ToRadians]("radians"), // aggregate functions +expression[HyperLogLogPlusPlus]("approx_distinct"), --- End diff -- approx_count_distinct? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44350995 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -108,7 +109,16 @@ trait CheckAnalysis { case Aggregate(groupingExprs, aggregateExprs, child) => def checkValidAggregateExpression(expr: Expression): Unit = expr match { - case _: AggregateExpression => // OK + case aggExpr: AggregateExpression => +// TODO: Is it possible that the child of a agg function is another +// agg function? +aggExpr.aggregateFunction.children.foreach { + case child if !child.deterministic => +failAnalysis( + s"nondeterministic expression ${expr.prettyString} are not allowed " + +s"in grouping expression.") --- End diff -- This error message is wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44350703 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -525,21 +526,15 @@ class Analyzer( case u @ UnresolvedFunction(name, children, isDistinct) => withPosition(u) { registry.lookupFunction(name, children) match { +// DISTINCT is not meaningful for a Max or a Min. +case max: Max if isDistinct => + AggregateExpression(max, Complete, isDistinct = false) +case min: Min if isDistinct => + AggregateExpression(min, Complete, isDistinct = false) // We get an aggregate function built based on AggregateFunction2 interface. // So, we wrap it in AggregateExpression2. --- End diff -- This is a little out of date. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44350660 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala --- @@ -272,7 +273,7 @@ object SqlParser extends AbstractSparkSQLParser with DataTypeParser { protected lazy val function: Parser[Expression] = ( ident <~ ("(" ~ "*" ~ ")") ^^ { case udfName => if (lexical.normalizeKeyword(udfName) == "count") { -Count(Literal(1)) +AggregateExpression(Count(Literal(1)), mode = Complete, isDistinct = false) --- End diff -- Is there a reason we can't just create unresolved functions here? (other than maybe the distinct case?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155232902 **[Test build #45442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45442/consoleFull)** for PR 9556 at commit [`302edac`](https://github.com/apache/spark/commit/302edac03296832a72ba944e484974f5f71b68f2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155232599 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155232567 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155232483 **[Test build #2024 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2024/consoleFull)** for PR 9556 at commit [`302edac`](https://github.com/apache/spark/commit/302edac03296832a72ba944e484974f5f71b68f2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155231968 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155231775 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155228594 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155228625 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155226238 Yeah. I will change the prettyString of {{AggregateExpression}}. Also, since I need to update anyway, I am including my change that introduces the flag used by MultipleDistinctRewriter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155225000 PySpark results are correct, just not in the correct form :(... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155224746 LGTM, pending a successful test build. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155223396 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155223300 **[Test build #45407 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45407/consoleFull)** for PR 9556 at commit [`bad4184`](https://github.com/apache/spark/commit/bad418404e7ba91402279f09a06c21e1930e2b13). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44344541 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -146,148 +146,105 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { } } - object HashAggregation extends Strategy { -def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { - // Aggregations that can be performed in two phases, before and after the shuffle. - case PartialAggregation( - namedGroupingAttributes, - rewrittenAggregateExpressions, - groupingExpressions, - partialComputation, - child) if !canBeConvertedToNewAggregation(plan) => -execution.Aggregate( - partial = false, - namedGroupingAttributes, - rewrittenAggregateExpressions, - execution.Aggregate( -partial = true, -groupingExpressions, -partialComputation, -planLater(child))) :: Nil - - case _ => Nil -} - -def canBeConvertedToNewAggregation(plan: LogicalPlan): Boolean = plan match { - case a: logical.Aggregate => -if (sqlContext.conf.useSqlAggregate2 && sqlContext.conf.codegenEnabled) { - a.newAggregation.isDefined -} else { - Utils.checkInvalidAggregateFunction2(a) - false -} - case _ => false -} - -def allAggregates(exprs: Seq[Expression]): Seq[AggregateExpression1] = - exprs.flatMap(_.collect { case a: AggregateExpression1 => a }) - } - /** * Used to plan the aggregate operator for expressions based on the AggregateFunction2 interface. */ object Aggregation extends Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { - case p: logical.Aggregate if sqlContext.conf.useSqlAggregate2 && - sqlContext.conf.codegenEnabled => -val converted = p.newAggregation -converted match { - case None => Nil // Cannot convert to new aggregation code path. - case Some(logical.Aggregate(groupingExpressions, resultExpressions, child)) => -// A single aggregate expression might appear multiple times in resultExpressions. -// In order to avoid evaluating an individual aggregate function multiple times, we'll -// build a set of the distinct aggregate expressions and build a function which can -// be used to re-write expressions so that they reference the single copy of the -// aggregate function which actually gets computed. -val aggregateExpressions = resultExpressions.flatMap { expr => - expr.collect { -case agg: AggregateExpression2 => agg - } -}.distinct -// For those distinct aggregate expressions, we create a map from the -// aggregate function to the corresponding attribute of the function. -val aggregateFunctionToAttribute = aggregateExpressions.map { agg => - val aggregateFunction = agg.aggregateFunction - val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute - (aggregateFunction, agg.isDistinct) -> attribute -}.toMap - -val (functionsWithDistinct, functionsWithoutDistinct) = - aggregateExpressions.partition(_.isDistinct) -if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) { - // This is a sanity check. We should not reach here when we have multiple distinct - // column sets (aggregate.NewAggregation will not match). - sys.error( -"Multiple distinct column sets are not supported by the new aggregation" + - "code path.") -} + case logical.Aggregate(groupingExpressions, resultExpressions, child) => +// A single aggregate expression might appear multiple times in resultExpressions. +// In order to avoid evaluating an individual aggregate function multiple times, we'll +// build a set of the distinct aggregate expressions and build a function which can +// be used to re-write expressions so that they reference the single copy of the +// aggregate function which actually gets computed. +val aggregateExpressions = resultExpressions.flatMap { expr => + expr.collect { +case agg: AggregateExpression => agg + } +}.distinct +
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44343490 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -146,148 +146,105 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { } } - object HashAggregation extends Strategy { -def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { - // Aggregations that can be performed in two phases, before and after the shuffle. - case PartialAggregation( - namedGroupingAttributes, - rewrittenAggregateExpressions, - groupingExpressions, - partialComputation, - child) if !canBeConvertedToNewAggregation(plan) => -execution.Aggregate( - partial = false, - namedGroupingAttributes, - rewrittenAggregateExpressions, - execution.Aggregate( -partial = true, -groupingExpressions, -partialComputation, -planLater(child))) :: Nil - - case _ => Nil -} - -def canBeConvertedToNewAggregation(plan: LogicalPlan): Boolean = plan match { - case a: logical.Aggregate => -if (sqlContext.conf.useSqlAggregate2 && sqlContext.conf.codegenEnabled) { - a.newAggregation.isDefined -} else { - Utils.checkInvalidAggregateFunction2(a) - false -} - case _ => false -} - -def allAggregates(exprs: Seq[Expression]): Seq[AggregateExpression1] = - exprs.flatMap(_.collect { case a: AggregateExpression1 => a }) - } - /** * Used to plan the aggregate operator for expressions based on the AggregateFunction2 interface. */ object Aggregation extends Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { - case p: logical.Aggregate if sqlContext.conf.useSqlAggregate2 && - sqlContext.conf.codegenEnabled => -val converted = p.newAggregation -converted match { - case None => Nil // Cannot convert to new aggregation code path. - case Some(logical.Aggregate(groupingExpressions, resultExpressions, child)) => -// A single aggregate expression might appear multiple times in resultExpressions. -// In order to avoid evaluating an individual aggregate function multiple times, we'll -// build a set of the distinct aggregate expressions and build a function which can -// be used to re-write expressions so that they reference the single copy of the -// aggregate function which actually gets computed. -val aggregateExpressions = resultExpressions.flatMap { expr => - expr.collect { -case agg: AggregateExpression2 => agg - } -}.distinct -// For those distinct aggregate expressions, we create a map from the -// aggregate function to the corresponding attribute of the function. -val aggregateFunctionToAttribute = aggregateExpressions.map { agg => - val aggregateFunction = agg.aggregateFunction - val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute - (aggregateFunction, agg.isDistinct) -> attribute -}.toMap - -val (functionsWithDistinct, functionsWithoutDistinct) = - aggregateExpressions.partition(_.isDistinct) -if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) { - // This is a sanity check. We should not reach here when we have multiple distinct - // column sets (aggregate.NewAggregation will not match). - sys.error( -"Multiple distinct column sets are not supported by the new aggregation" + - "code path.") -} + case logical.Aggregate(groupingExpressions, resultExpressions, child) => +// A single aggregate expression might appear multiple times in resultExpressions. +// In order to avoid evaluating an individual aggregate function multiple times, we'll +// build a set of the distinct aggregate expressions and build a function which can +// be used to re-write expressions so that they reference the single copy of the +// aggregate function which actually gets computed. +val aggregateExpressions = resultExpressions.flatMap { expr => + expr.collect { +case agg: AggregateExpression => agg + } +}.distinct
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/9556#discussion_r44327164 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -146,148 +146,105 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { } } - object HashAggregation extends Strategy { -def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { - // Aggregations that can be performed in two phases, before and after the shuffle. - case PartialAggregation( - namedGroupingAttributes, - rewrittenAggregateExpressions, - groupingExpressions, - partialComputation, - child) if !canBeConvertedToNewAggregation(plan) => -execution.Aggregate( - partial = false, - namedGroupingAttributes, - rewrittenAggregateExpressions, - execution.Aggregate( -partial = true, -groupingExpressions, -partialComputation, -planLater(child))) :: Nil - - case _ => Nil -} - -def canBeConvertedToNewAggregation(plan: LogicalPlan): Boolean = plan match { - case a: logical.Aggregate => -if (sqlContext.conf.useSqlAggregate2 && sqlContext.conf.codegenEnabled) { - a.newAggregation.isDefined -} else { - Utils.checkInvalidAggregateFunction2(a) - false -} - case _ => false -} - -def allAggregates(exprs: Seq[Expression]): Seq[AggregateExpression1] = - exprs.flatMap(_.collect { case a: AggregateExpression1 => a }) - } - /** * Used to plan the aggregate operator for expressions based on the AggregateFunction2 interface. */ object Aggregation extends Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { - case p: logical.Aggregate if sqlContext.conf.useSqlAggregate2 && - sqlContext.conf.codegenEnabled => -val converted = p.newAggregation -converted match { - case None => Nil // Cannot convert to new aggregation code path. - case Some(logical.Aggregate(groupingExpressions, resultExpressions, child)) => -// A single aggregate expression might appear multiple times in resultExpressions. -// In order to avoid evaluating an individual aggregate function multiple times, we'll -// build a set of the distinct aggregate expressions and build a function which can -// be used to re-write expressions so that they reference the single copy of the -// aggregate function which actually gets computed. -val aggregateExpressions = resultExpressions.flatMap { expr => - expr.collect { -case agg: AggregateExpression2 => agg - } -}.distinct -// For those distinct aggregate expressions, we create a map from the -// aggregate function to the corresponding attribute of the function. -val aggregateFunctionToAttribute = aggregateExpressions.map { agg => - val aggregateFunction = agg.aggregateFunction - val attribute = Alias(aggregateFunction, aggregateFunction.toString)().toAttribute - (aggregateFunction, agg.isDistinct) -> attribute -}.toMap - -val (functionsWithDistinct, functionsWithoutDistinct) = - aggregateExpressions.partition(_.isDistinct) -if (functionsWithDistinct.map(_.aggregateFunction.children).distinct.length > 1) { - // This is a sanity check. We should not reach here when we have multiple distinct - // column sets (aggregate.NewAggregation will not match). - sys.error( -"Multiple distinct column sets are not supported by the new aggregation" + - "code path.") -} + case logical.Aggregate(groupingExpressions, resultExpressions, child) => +// A single aggregate expression might appear multiple times in resultExpressions. +// In order to avoid evaluating an individual aggregate function multiple times, we'll +// build a set of the distinct aggregate expressions and build a function which can +// be used to re-write expressions so that they reference the single copy of the +// aggregate function which actually gets computed. +val aggregateExpressions = resultExpressions.flatMap { expr => + expr.collect { +case agg: AggregateExpression => agg + } +}.distinct +
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155182875 **[Test build #45407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45407/consoleFull)** for PR 9556 at commit [`bad4184`](https://github.com/apache/spark/commit/bad418404e7ba91402279f09a06c21e1930e2b13). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9830] [SQL] Remove AggregateExpression1...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9556#issuecomment-155179711 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org