[ 
https://issues.apache.org/jira/browse/SPARK-10100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702279#comment-14702279
 ] 

Yin Huai commented on SPARK-10100:
----------------------------------

[~hvanhovell]

I think it the expression we are using causes the slowness.

In new version of Max, we have 
{code}
override val updateExpressions = Seq(
    /* max = */ If(IsNull(child), max, If(IsNull(max), child, Greatest(Seq(max, 
child))))
  )
{code}

For the old MaxFunction, we have
{code}
val currentMax: MutableLiteral = MutableLiteral(null, expr.dataType)
  val cmp = LessThan(currentMax, expr)

  override def update(input: InternalRow): Unit = {
    if (currentMax.value == null) {
      currentMax.value = expr.eval(input)
    } else if (cmp.eval(input) == true) {
      currentMax.value = expr.eval(input)
    }
  }
{code}

I feel we are just using a more expansive expression to calculate max (and 
probably min).

Will you have time to look at it? I think the fix will be pretty small and we 
can get it in 1.5.

> AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction
> --------------------------------------------------------------------------
>
>                 Key: SPARK-10100
>                 URL: https://issues.apache.org/jira/browse/SPARK-10100
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>
> Looks like Max (probably Min) implemented based on AggregateFunction2 is 
> slower than the old MaxFunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to