[jira] [Comment Edited] (FLINK-5564) User Defined Aggregates

Shaoxuan Wang (JIRA) Wed, 08 Feb 2017 21:17:22 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859039#comment-15859039
 ]


Shaoxuan Wang edited comment on FLINK-5564 at 2/9/17 5:16 AM:
--------------------------------------------------------------

Thanks [~fhueske], 
Absolutely, I agree with you that it is better to separate the huge PR into 
some ones. (merge 1,2,3 will lead to more than 3K lines change)
But I am afraid I did not completely get your suggested #1. Migrating the 
existing Agg without changing runtime code will lead to all IntergrationTest 
fail. One possible way is that I create new interface (say AggregateFunction) 
and create a few Aggs which is implemented from new interface (say intAgg 
extends AggregateFunction), and in step #1, I just add queryPlan tests, like 
what we usually did in GroupWindowTest. Is this what you are suggesting.


was (Author: shaoxuanwang):
Thanks [~fhueske], 
Obsoletely, I agree with you that it is better to separate the huge PR into 
some ones. (merge 1,2,3 will lead to more than 3K lines change)
But I am afraid I did not completely get your suggested #1. Migrating the 
existing Agg without changing runtime code will lead to all IntergrationTest 
fail. One possible way is that I create new interface (say AggregateFunction) 
and create a few Aggs which is implemented from new interface (say intAgg 
extends AggregateFunction), and in step #1, I just add queryPlan tests, like 
what we usually did in GroupWindowTest. Is this what you are suggesting.

> User Defined Aggregates
> -----------------------
>
>                 Key: FLINK-5564
>                 URL: https://issues.apache.org/jira/browse/FLINK-5564
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>            Reporter: Shaoxuan Wang
>            Assignee: Shaoxuan Wang
>
> User-defined aggregates would be a great addition to the Table API / SQL.
> The current aggregate interface is not well suited for the external users.  
> This issue proposes to redesign the aggregate such that we can expose an 
> better external UDAGG interface to the users. The detailed design proposal 
> can be found here: 
> https://docs.google.com/document/d/19JXK8jLIi8IqV9yf7hOs_Oz67yXOypY7Uh5gIOK2r-U/edit
> Motivation:
> 1. The current aggregate interface is not very concise to the users. One 
> needs to know the design details of the intermediate Row buffer before 
> implements an Aggregate. Seven functions are needed even for a simple Count 
> aggregate.
> 2. Another limitation of current aggregate function is that it can only be 
> applied on one single column. There are many scenarios which require the 
> aggregate function taking multiple columns as the inputs.
> 3. “Retraction” is not considered and covered in the current Aggregate.
> 4. It might be very good to have a local/global aggregate query plan 
> optimization, which is very promising to optimize UDAGG performance in some 
> scenarios.
> Proposed Changes:
> 1. Implement an aggregate dataStream API (Done by 
> [FLINK-5582|https://issues.apache.org/jira/browse/FLINK-5582])
> 2. Update all the existing aggregates to use the new aggregate dataStream API
> 3. Provide a better User-Defined Aggregate interface
> 4. Add retraction support
> 5. Add local/global aggregate



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (FLINK-5564) User Defined Aggregates

Reply via email to