Re: Cumulative Sum function using Dataset API

ayan guha Mon, 08 Aug 2016 16:56:04 -0700

You mean you are not able to use sum(col) over (partition by key order by
some_col) ?


On Tue, Aug 9, 2016 at 9:53 AM, jon <jon.barksd...@gmail.com> wrote:

> Hi all,
>
> I'm trying to write a function that calculates a cumulative sum as a column
> using the Dataset API, and I'm a little stuck on the implementation.  From
> what I can tell, UserDefinedAggregateFunctions don't seem to support
> windowing clauses, which I think I need for this use case.  If I write a
> function that extends from AggregateWindowFunction, I end up needing
> classes
> that are package private to the sql package, so I need to make my function
> under the org.apache.spark.sql package, which just feels wrong.
>
> I've also considered writing a custom transformer, but haven't spend as
> much
> time reading through the code, so I don't know how easy or hard that would
> be.
>
> TLDR; What's the best way to write a function that returns a value for
> every
> row, but has mutable state, and gets row in a specific order?
>
> Does anyone have any ideas, or examples?
>
> Thanks,
>
> Jon
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Cumulative-Sum-function-using-
> Dataset-API-tp27496.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
Best Regards,
Ayan Guha

Re: Cumulative Sum function using Dataset API

Reply via email to