You mean you are not able to use sum(col) over (partition by key order by some_col) ?
On Tue, Aug 9, 2016 at 9:53 AM, jon <jon.barksd...@gmail.com> wrote: > Hi all, > > I'm trying to write a function that calculates a cumulative sum as a column > using the Dataset API, and I'm a little stuck on the implementation. From > what I can tell, UserDefinedAggregateFunctions don't seem to support > windowing clauses, which I think I need for this use case. If I write a > function that extends from AggregateWindowFunction, I end up needing > classes > that are package private to the sql package, so I need to make my function > under the org.apache.spark.sql package, which just feels wrong. > > I've also considered writing a custom transformer, but haven't spend as > much > time reading through the code, so I don't know how easy or hard that would > be. > > TLDR; What's the best way to write a function that returns a value for > every > row, but has mutable state, and gets row in a specific order? > > Does anyone have any ideas, or examples? > > Thanks, > > Jon > > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Cumulative-Sum-function-using- > Dataset-API-tp27496.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Best Regards, Ayan Guha