[jira] [Commented] (SPARK-6802) User Defined Aggregate Function Refactoring

2017-11-15 Thread holdenk (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253634#comment-16253634
 ] 

holdenk commented on SPARK-6802:


Oh wait sorry I re-opened the wrong issue.

> User Defined Aggregate Function Refactoring
> ---
>
> Key: SPARK-6802
> URL: https://issues.apache.org/jira/browse/SPARK-6802
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
> Environment: We use Spark Dataframe, SQL along with json, sql and 
> pandas quite a bit
>Reporter: cynepia
>
> While trying to use custom aggregates in spark (something which is common in 
> pandas), We realized that Custom Aggregate Functions aren't well supported 
> across various features/functions in Spark beyond what is supported by Hive. 
> There are futher discussions on the topic viz-a -viz the issue SPARK-3947, 
> which points to similar improvement tickets opened earlier for refactoring 
> the UDAF area.
> While we refactor the interface for aggregates, It would make sense to keep 
> in consideration, the recently added DataFrame, GroupedData, and possibly 
> also sql.dataframe.Column, which looks different from pandas.Series and isn't 
> currently supporting any aggregations.
> Would like to get feedback from the folks, who are actively looking at this...
> We would be happy to participate and contribute, if there are any discussions 
> on the same.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6802) User Defined Aggregate Function Refactoring

2017-01-20 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15831570#comment-15831570
 ] 

Hyukjin Kwon commented on SPARK-6802:
-

If this JIRA is about Python UDAF, there is a JIRA - 
https://issues.apache.org/jira/browse/SPARK-10915 

If this JIRA is about refactoring the UDAF in Scala/Java side, I feel I don't 
know what I should fix.

{quote}
We realized that Custom Aggregate Functions aren't well supported across 
various features/functions in Spark beyond what is supported by Hive
{quote}


{quote}
It would make sense to keep in consideration, the recently added DataFrame, 
GroupedData, and possibly also sql.dataframe.Column, which looks different from 
pandas.Series
{quote}

These two statements are too broad and vague. We should point out what to fix 
or what is the concrete problem.

Please correct me if I misunderstood or am wrong.


> User Defined Aggregate Function Refactoring
> ---
>
> Key: SPARK-6802
> URL: https://issues.apache.org/jira/browse/SPARK-6802
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
> Environment: We use Spark Dataframe, SQL along with json, sql and 
> pandas quite a bit
>Reporter: cynepia
>
> While trying to use custom aggregates in spark (something which is common in 
> pandas), We realized that Custom Aggregate Functions aren't well supported 
> across various features/functions in Spark beyond what is supported by Hive. 
> There are futher discussions on the topic viz-a -viz the issue SPARK-3947, 
> which points to similar improvement tickets opened earlier for refactoring 
> the UDAF area.
> While we refactor the interface for aggregates, It would make sense to keep 
> in consideration, the recently added DataFrame, GroupedData, and possibly 
> also sql.dataframe.Column, which looks different from pandas.Series and isn't 
> currently supporting any aggregations.
> Would like to get feedback from the folks, who are actively looking at this...
> We would be happy to participate and contribute, if there are any discussions 
> on the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6802) User Defined Aggregate Function Refactoring

2015-07-22 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637555#comment-14637555
 ] 

Yin Huai commented on SPARK-6802:
-

We have added Scala/Java UDAF support through SPARK-3947. Is this JIRA for 
Python UDAF?

 User Defined Aggregate Function Refactoring
 ---

 Key: SPARK-6802
 URL: https://issues.apache.org/jira/browse/SPARK-6802
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
 Environment: We use Spark Dataframe, SQL along with json, sql and 
 pandas quite a bit
Reporter: cynepia

 While trying to use custom aggregates in spark (something which is common in 
 pandas), We realized that Custom Aggregate Functions aren't well supported 
 across various features/functions in Spark beyond what is supported by Hive. 
 There are futher discussions on the topic viz-a -viz the issue SPARK-3947, 
 which points to similar improvement tickets opened earlier for refactoring 
 the UDAF area.
 While we refactor the interface for aggregates, It would make sense to keep 
 in consideration, the recently added DataFrame, GroupedData, and possibly 
 also sql.dataframe.Column, which looks different from pandas.Series and isn't 
 currently supporting any aggregations.
 Would like to get feedback from the folks, who are actively looking at this...
 We would be happy to participate and contribute, if there are any discussions 
 on the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6802) User Defined Aggregate Function Refactoring

2015-05-15 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545654#comment-14545654
 ] 

Apache Spark commented on SPARK-6802:
-

User 'hqzizania' has created a pull request for this issue:
https://github.com/apache/spark/pull/6190

 User Defined Aggregate Function Refactoring
 ---

 Key: SPARK-6802
 URL: https://issues.apache.org/jira/browse/SPARK-6802
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
 Environment: We use Spark Dataframe, SQL along with json, sql and 
 pandas quite a bit
Reporter: cynepia

 While trying to use custom aggregates in spark (something which is common in 
 pandas), We realized that Custom Aggregate Functions aren't well supported 
 across various features/functions in Spark beyond what is supported by Hive. 
 There are futher discussions on the topic viz-a -viz the issue SPARK-3947, 
 which points to similar improvement tickets opened earlier for refactoring 
 the UDAF area.
 While we refactor the interface for aggregates, It would make sense to keep 
 in consideration, the recently added DataFrame, GroupedData, and possibly 
 also sql.dataframe.Column, which looks different from pandas.Series and isn't 
 currently supporting any aggregations.
 Would like to get feedback from the folks, who are actively looking at this...
 We would be happy to participate and contribute, if there are any discussions 
 on the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org