RE: Spark SQL running totals

2015-10-16 Thread Stefan Panayotov
t: ‎10/‎15/‎2015 2:35 PM To: Stefan Panayotov<mailto:spanayo...@msn.com> Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Spark SQL running totals you can do a self join of the table with itself with the join clause being a.col1 >= b.col1 select a.col1, a.col2, su

RE: Spark SQL running totals

2015-10-15 Thread Stefan Panayotov
ilto:mich...@databricks.com>; Deenar Toraskar<mailto:deenar.toras...@gmail.com> Cc: Stefan Panayotov<mailto:spanayo...@msn.com>; user@spark.apache.org<mailto:user@spark.apache.org> Subject: RE: Spark SQL running totals My mistake. I didn't noticed "UNBOUNDED PRECEDING" already

RE: Spark SQL running totals

2015-10-15 Thread java8964
My mistake. I didn't noticed "UNBOUNDED PRECEDING" already supported. So cumulative sum should work then. Thanks Yong From: java8...@hotmail.com To: mich...@databricks.com; deenar.toras...@gmail.com CC: spanayo...@msn.com; user@spark.apache.org Subject: RE: Spark SQL running totals Dat

RE: Spark SQL running totals

2015-10-15 Thread java8964
;cume_sum" in Spark window function, or am I total misunderstand about "sum() over (partitioned by)" in it? Yong From: mich...@databricks.com Date: Thu, 15 Oct 2015 11:51:59 -0700 Subject: Re: Spark SQL running totals To: deenar.toras...@gmail.com CC: spanayo...@msn.com; user@spark.a

Re: Spark SQL running totals

2015-10-15 Thread Deenar Toraskar
you can do a self join of the table with itself with the join clause being a.col1 >= b.col1 select a.col1, a.col2, sum(b.col2) from tablea as a left outer join tablea as b on (a.col1 >= b.col1) group by a.col1, a.col2 I havent tried it, but cant see why it cant work, but doing it in RDD might be

Re: Spark SQL running totals

2015-10-15 Thread Michael Armbrust
Check out: https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html On Thu, Oct 15, 2015 at 11:35 AM, Deenar Toraskar wrote: > you can do a self join of the table with itself with the join clause being > a.col1 >= b.col1 > > select

Re: Spark SQL running totals

2015-10-15 Thread Kristina Rogale Plazonic
You can do it and many other transformations very easily with window functions, see this blog post: https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html In your case you would do (in Scala): import org.apache.spark.sql.expressions.Window import