Hi all, I'm trying to write a function that calculates a cumulative sum as a column using the Dataset API, and I'm a little stuck on the implementation. From what I can tell, UserDefinedAggregateFunctions don't seem to support windowing clauses, which I think I need for this use case. If I write a function that extends from AggregateWindowFunction, I end up needing classes that are package private to the sql package, so I need to make my function under the org.apache.spark.sql package, which just feels wrong.
I've also considered writing a custom transformer, but haven't spend as much time reading through the code, so I don't know how easy or hard that would be. TLDR; What's the best way to write a function that returns a value for every row, but has mutable state, and gets row in a specific order? Does anyone have any ideas, or examples? Thanks, Jon -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cumulative-Sum-function-using-Dataset-API-tp27496.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org