[jira] [Commented] (SPARK-6802) User Defined Aggregate Function Refactoring
[ https://issues.apache.org/jira/browse/SPARK-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253634#comment-16253634 ] holdenk commented on SPARK-6802: Oh wait sorry I re-opened the wrong issue. > User Defined Aggregate Function Refactoring > --- > > Key: SPARK-6802 > URL: https://issues.apache.org/jira/browse/SPARK-6802 > Project: Spark > Issue Type: Improvement > Components: PySpark > Environment: We use Spark Dataframe, SQL along with json, sql and > pandas quite a bit >Reporter: cynepia > > While trying to use custom aggregates in spark (something which is common in > pandas), We realized that Custom Aggregate Functions aren't well supported > across various features/functions in Spark beyond what is supported by Hive. > There are futher discussions on the topic viz-a -viz the issue SPARK-3947, > which points to similar improvement tickets opened earlier for refactoring > the UDAF area. > While we refactor the interface for aggregates, It would make sense to keep > in consideration, the recently added DataFrame, GroupedData, and possibly > also sql.dataframe.Column, which looks different from pandas.Series and isn't > currently supporting any aggregations. > Would like to get feedback from the folks, who are actively looking at this... > We would be happy to participate and contribute, if there are any discussions > on the same. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6802) User Defined Aggregate Function Refactoring
[ https://issues.apache.org/jira/browse/SPARK-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831570#comment-15831570 ] Hyukjin Kwon commented on SPARK-6802: - If this JIRA is about Python UDAF, there is a JIRA - https://issues.apache.org/jira/browse/SPARK-10915 If this JIRA is about refactoring the UDAF in Scala/Java side, I feel I don't know what I should fix. {quote} We realized that Custom Aggregate Functions aren't well supported across various features/functions in Spark beyond what is supported by Hive {quote} {quote} It would make sense to keep in consideration, the recently added DataFrame, GroupedData, and possibly also sql.dataframe.Column, which looks different from pandas.Series {quote} These two statements are too broad and vague. We should point out what to fix or what is the concrete problem. Please correct me if I misunderstood or am wrong. > User Defined Aggregate Function Refactoring > --- > > Key: SPARK-6802 > URL: https://issues.apache.org/jira/browse/SPARK-6802 > Project: Spark > Issue Type: Improvement > Components: PySpark > Environment: We use Spark Dataframe, SQL along with json, sql and > pandas quite a bit >Reporter: cynepia > > While trying to use custom aggregates in spark (something which is common in > pandas), We realized that Custom Aggregate Functions aren't well supported > across various features/functions in Spark beyond what is supported by Hive. > There are futher discussions on the topic viz-a -viz the issue SPARK-3947, > which points to similar improvement tickets opened earlier for refactoring > the UDAF area. > While we refactor the interface for aggregates, It would make sense to keep > in consideration, the recently added DataFrame, GroupedData, and possibly > also sql.dataframe.Column, which looks different from pandas.Series and isn't > currently supporting any aggregations. > Would like to get feedback from the folks, who are actively looking at this... > We would be happy to participate and contribute, if there are any discussions > on the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6802) User Defined Aggregate Function Refactoring
[ https://issues.apache.org/jira/browse/SPARK-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637555#comment-14637555 ] Yin Huai commented on SPARK-6802: - We have added Scala/Java UDAF support through SPARK-3947. Is this JIRA for Python UDAF? User Defined Aggregate Function Refactoring --- Key: SPARK-6802 URL: https://issues.apache.org/jira/browse/SPARK-6802 Project: Spark Issue Type: Improvement Components: PySpark, SQL Environment: We use Spark Dataframe, SQL along with json, sql and pandas quite a bit Reporter: cynepia While trying to use custom aggregates in spark (something which is common in pandas), We realized that Custom Aggregate Functions aren't well supported across various features/functions in Spark beyond what is supported by Hive. There are futher discussions on the topic viz-a -viz the issue SPARK-3947, which points to similar improvement tickets opened earlier for refactoring the UDAF area. While we refactor the interface for aggregates, It would make sense to keep in consideration, the recently added DataFrame, GroupedData, and possibly also sql.dataframe.Column, which looks different from pandas.Series and isn't currently supporting any aggregations. Would like to get feedback from the folks, who are actively looking at this... We would be happy to participate and contribute, if there are any discussions on the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6802) User Defined Aggregate Function Refactoring
[ https://issues.apache.org/jira/browse/SPARK-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545654#comment-14545654 ] Apache Spark commented on SPARK-6802: - User 'hqzizania' has created a pull request for this issue: https://github.com/apache/spark/pull/6190 User Defined Aggregate Function Refactoring --- Key: SPARK-6802 URL: https://issues.apache.org/jira/browse/SPARK-6802 Project: Spark Issue Type: Improvement Components: PySpark, SQL Environment: We use Spark Dataframe, SQL along with json, sql and pandas quite a bit Reporter: cynepia While trying to use custom aggregates in spark (something which is common in pandas), We realized that Custom Aggregate Functions aren't well supported across various features/functions in Spark beyond what is supported by Hive. There are futher discussions on the topic viz-a -viz the issue SPARK-3947, which points to similar improvement tickets opened earlier for refactoring the UDAF area. While we refactor the interface for aggregates, It would make sense to keep in consideration, the recently added DataFrame, GroupedData, and possibly also sql.dataframe.Column, which looks different from pandas.Series and isn't currently supporting any aggregations. Would like to get feedback from the folks, who are actively looking at this... We would be happy to participate and contribute, if there are any discussions on the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org