[jira] [Commented] (SPARK-10100) AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction

2015-08-20 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705057#comment-14705057
 ] 

Yin Huai commented on SPARK-10100:
--

I am changing the title of this jira to "Eliminate hash table lookup if there 
is no grouping key in aggregation." since 
https://github.com/apache/spark/pull/8332 is using this JIRA as the issue. For 
the Max and Min expressions, we can revisit them later if we find a better way 
to improve the performance.

> AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction
> --
>
> Key: SPARK-10100
> URL: https://issues.apache.org/jira/browse/SPARK-10100
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Assignee: Herman van Hovell
> Fix For: 1.5.0
>
> Attachments: SPARK-10100.perf.test.scala
>
>
> Looks like Max (probably Min) implemented based on AggregateFunction2 is 
> slower than the old MaxFunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10100) AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction

2015-08-20 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704742#comment-14704742
 ] 

Herman van Hovell commented on SPARK-10100:
---

Lets leave it for 1.6.

> AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction
> --
>
> Key: SPARK-10100
> URL: https://issues.apache.org/jira/browse/SPARK-10100
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Assignee: Herman van Hovell
> Attachments: SPARK-10100.perf.test.scala
>
>
> Looks like Max (probably Min) implemented based on AggregateFunction2 is 
> slower than the old MaxFunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10100) AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction

2015-08-20 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704418#comment-14704418
 ] 

Apache Spark commented on SPARK-10100:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/8332

> AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction
> --
>
> Key: SPARK-10100
> URL: https://issues.apache.org/jira/browse/SPARK-10100
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Assignee: Herman van Hovell
> Attachments: SPARK-10100.perf.test.scala
>
>
> Looks like Max (probably Min) implemented based on AggregateFunction2 is 
> slower than the old MaxFunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10100) AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction

2015-08-19 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704351#comment-14704351
 ] 

Yin Huai commented on SPARK-10100:
--

How about we leave these functions as is for now (looks like the improvement 
provided by updating expressions is not very significant and also  avoid code 
changes in the QA period )? 

> AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction
> --
>
> Key: SPARK-10100
> URL: https://issues.apache.org/jira/browse/SPARK-10100
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Assignee: Herman van Hovell
> Attachments: SPARK-10100.perf.test.scala
>
>
> Looks like Max (probably Min) implemented based on AggregateFunction2 is 
> slower than the old MaxFunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10100) AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction

2015-08-19 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704322#comment-14704322
 ] 

Yin Huai commented on SPARK-10100:
--

The dataset I created has 11 columns and 2 groups. The query was applying 
10 max functions
{code}
sqlContext.sql("""
  select
i,
sum(j1),
sum(j2),
sum(j3),
sum(j4),
sum(j5),
sum(j6),
sum(j7),
sum(j8),
sum(j9),
sum(j10)
  from testAgg
  group by i""")
{code}
In my laptop, 1.5 is about 5% slower than 1.4.

> AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction
> --
>
> Key: SPARK-10100
> URL: https://issues.apache.org/jira/browse/SPARK-10100
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Assignee: Herman van Hovell
> Attachments: SPARK-10100.perf.test.scala
>
>
> Looks like Max (probably Min) implemented based on AggregateFunction2 is 
> slower than the old MaxFunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10100) AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction

2015-08-19 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704309#comment-14704309
 ] 

Yin Huai commented on SPARK-10100:
--

I was comparing 1.4 with 1.5 and found 1.5 is slower. I also tweaked about the 
update expression in master. Seems no significant improvement.

> AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction
> --
>
> Key: SPARK-10100
> URL: https://issues.apache.org/jira/browse/SPARK-10100
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Assignee: Herman van Hovell
> Attachments: SPARK-10100.perf.test.scala
>
>
> Looks like Max (probably Min) implemented based on AggregateFunction2 is 
> slower than the old MaxFunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10100) AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction

2015-08-18 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702398#comment-14702398
 ] 

Yin Huai commented on SPARK-10100:
--

[~hvanhovell] How's the performance?

> AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction
> --
>
> Key: SPARK-10100
> URL: https://issues.apache.org/jira/browse/SPARK-10100
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Assignee: Yin Huai
>
> Looks like Max (probably Min) implemented based on AggregateFunction2 is 
> slower than the old MaxFunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10100) AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction

2015-08-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702343#comment-14702343
 ] 

Apache Spark commented on SPARK-10100:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/8298

> AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction
> --
>
> Key: SPARK-10100
> URL: https://issues.apache.org/jira/browse/SPARK-10100
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Assignee: Yin Huai
>
> Looks like Max (probably Min) implemented based on AggregateFunction2 is 
> slower than the old MaxFunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10100) AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction

2015-08-18 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702344#comment-14702344
 ] 

Herman van Hovell commented on SPARK-10100:
---

PR is in.

> AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction
> --
>
> Key: SPARK-10100
> URL: https://issues.apache.org/jira/browse/SPARK-10100
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Assignee: Yin Huai
>
> Looks like Max (probably Min) implemented based on AggregateFunction2 is 
> slower than the old MaxFunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10100) AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction

2015-08-18 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702279#comment-14702279
 ] 

Yin Huai commented on SPARK-10100:
--

[~hvanhovell]

I think it the expression we are using causes the slowness.

In new version of Max, we have 
{code}
override val updateExpressions = Seq(
/* max = */ If(IsNull(child), max, If(IsNull(max), child, Greatest(Seq(max, 
child
  )
{code}

For the old MaxFunction, we have
{code}
val currentMax: MutableLiteral = MutableLiteral(null, expr.dataType)
  val cmp = LessThan(currentMax, expr)

  override def update(input: InternalRow): Unit = {
if (currentMax.value == null) {
  currentMax.value = expr.eval(input)
} else if (cmp.eval(input) == true) {
  currentMax.value = expr.eval(input)
}
  }
{code}

I feel we are just using a more expansive expression to calculate max (and 
probably min).

Will you have time to look at it? I think the fix will be pretty small and we 
can get it in 1.5.

> AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction
> --
>
> Key: SPARK-10100
> URL: https://issues.apache.org/jira/browse/SPARK-10100
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Assignee: Yin Huai
>
> Looks like Max (probably Min) implemented based on AggregateFunction2 is 
> slower than the old MaxFunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10100) AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction

2015-08-18 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702252#comment-14702252
 ] 

Herman van Hovell commented on SPARK-10100:
---

Any idea why? JoinedRow?

> AggregateFunction2's Max is slower than AggregateExpression1's MaxFunction
> --
>
> Key: SPARK-10100
> URL: https://issues.apache.org/jira/browse/SPARK-10100
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Assignee: Yin Huai
>
> Looks like Max (probably Min) implemented based on AggregateFunction2 is 
> slower than the old MaxFunction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org