Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r165224852 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -199,7 +200,7 @@ object ExtractFiltersAndInnerJoins extends PredicateHelper { object PhysicalAggregation { // groupingExpressions, aggregateExpressions, resultExpressions, child type ReturnType = - (Seq[NamedExpression], Seq[AggregateExpression], Seq[NamedExpression], LogicalPlan) + (Seq[NamedExpression], Seq[Expression], Seq[NamedExpression], LogicalPlan) --- End diff -- Hi @yhuai, You bring up a good point. I agree with you ideally we should avoid doing. When I was making the change, I found the solution implemented results in least amount of duplicate code, because a lot of logic is shared between AggregateExpression and Python UDF, but the downside is exactly what you mentioned. One alternative is to create new rules for Python UDAF, my concern is that could result in quite a bit of code duplication. Maybe there is a way to avoid code duplication and keep the type safety, I am happy to explore the option. (Maybe create a parent class for AggregateExpression and Python UDAF)?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org