[ https://issues.apache.org/jira/browse/SPARK-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253634#comment-16253634 ]
holdenk commented on SPARK-6802: -------------------------------- Oh wait sorry I re-opened the wrong issue. > User Defined Aggregate Function Refactoring > ------------------------------------------- > > Key: SPARK-6802 > URL: https://issues.apache.org/jira/browse/SPARK-6802 > Project: Spark > Issue Type: Improvement > Components: PySpark > Environment: We use Spark Dataframe, SQL along with json, sql and > pandas quite a bit > Reporter: cynepia > > While trying to use custom aggregates in spark (something which is common in > pandas), We realized that Custom Aggregate Functions aren't well supported > across various features/functions in Spark beyond what is supported by Hive. > There are futher discussions on the topic viz-a -viz the issue SPARK-3947, > which points to similar improvement tickets opened earlier for refactoring > the UDAF area. > While we refactor the interface for aggregates, It would make sense to keep > in consideration, the recently added DataFrame, GroupedData, and possibly > also sql.dataframe.Column, which looks different from pandas.Series and isn't > currently supporting any aggregations. > Would like to get feedback from the folks, who are actively looking at this... > We would be happy to participate and contribute, if there are any discussions > on the same. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org