t: 10/15/2015 2:35 PM
To: Stefan Panayotov<mailto:spanayo...@msn.com>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: Spark SQL running totals
you can do a self join of the table with itself with the join clause being
a.col1 >= b.col1
select a.col1, a.col2, su
ilto:mich...@databricks.com>; Deenar
Toraskar<mailto:deenar.toras...@gmail.com>
Cc: Stefan Panayotov<mailto:spanayo...@msn.com>;
user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: Spark SQL running totals
My mistake. I didn't noticed "UNBOUNDED PRECEDING" already
My mistake. I didn't noticed "UNBOUNDED PRECEDING" already supported.
So cumulative sum should work then.
Thanks
Yong
From: java8...@hotmail.com
To: mich...@databricks.com; deenar.toras...@gmail.com
CC: spanayo...@msn.com; user@spark.apache.org
Subject: RE: Spark SQL running totals
Dat
;cume_sum" in Spark window function, or am I total
misunderstand about "sum() over (partitioned by)" in it?
Yong
From: mich...@databricks.com
Date: Thu, 15 Oct 2015 11:51:59 -0700
Subject: Re: Spark SQL running totals
To: deenar.toras...@gmail.com
CC: spanayo...@msn.com; user@spark.a
you can do a self join of the table with itself with the join clause being
a.col1 >= b.col1
select a.col1, a.col2, sum(b.col2)
from tablea as a left outer join tablea as b on (a.col1 >= b.col1)
group by a.col1, a.col2
I havent tried it, but cant see why it cant work, but doing it in RDD might
be
Check out:
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
On Thu, Oct 15, 2015 at 11:35 AM, Deenar Toraskar wrote:
> you can do a self join of the table with itself with the join clause being
> a.col1 >= b.col1
>
> select
You can do it and many other transformations very easily with window
functions, see this blog post:
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
In your case you would do (in Scala):
import org.apache.spark.sql.expressions.Window
import