[ https://issues.apache.org/jira/browse/SPARK-10100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702279#comment-14702279 ]
Yin Huai commented on SPARK-10100: ---------------------------------- [~hvanhovell] I think it the expression we are using causes the slowness. In new version of Max, we have {code} override val updateExpressions = Seq( /* max = */ If(IsNull(child), max, If(IsNull(max), child, Greatest(Seq(max, child)))) ) {code} For the old MaxFunction, we have {code} val currentMax: MutableLiteral = MutableLiteral(null, expr.dataType) val cmp = LessThan(currentMax, expr) override def update(input: InternalRow): Unit = { if (currentMax.value == null) { currentMax.value = expr.eval(input) } else if (cmp.eval(input) == true) { currentMax.value = expr.eval(input) } } {code} I feel we are just using a more expansive expression to calculate max (and probably min). Will you have time to look at it? I think the fix will be pretty small and we can get it in 1.5. > AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction > -------------------------------------------------------------------------- > > Key: SPARK-10100 > URL: https://issues.apache.org/jira/browse/SPARK-10100 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 1.5.0 > Reporter: Yin Huai > Assignee: Yin Huai > > Looks like Max (probably Min) implemented based on AggregateFunction2 is > slower than the old MaxFunction. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org