[ https://issues.apache.org/jira/browse/FLINK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859039#comment-15859039 ]
Shaoxuan Wang edited comment on FLINK-5564 at 2/9/17 5:16 AM: -------------------------------------------------------------- Thanks [~fhueske], Absolutely, I agree with you that it is better to separate the huge PR into some ones. (merge 1,2,3 will lead to more than 3K lines change) But I am afraid I did not completely get your suggested #1. Migrating the existing Agg without changing runtime code will lead to all IntergrationTest fail. One possible way is that I create new interface (say AggregateFunction) and create a few Aggs which is implemented from new interface (say intAgg extends AggregateFunction), and in step #1, I just add queryPlan tests, like what we usually did in GroupWindowTest. Is this what you are suggesting. was (Author: shaoxuanwang): Thanks [~fhueske], Obsoletely, I agree with you that it is better to separate the huge PR into some ones. (merge 1,2,3 will lead to more than 3K lines change) But I am afraid I did not completely get your suggested #1. Migrating the existing Agg without changing runtime code will lead to all IntergrationTest fail. One possible way is that I create new interface (say AggregateFunction) and create a few Aggs which is implemented from new interface (say intAgg extends AggregateFunction), and in step #1, I just add queryPlan tests, like what we usually did in GroupWindowTest. Is this what you are suggesting. > User Defined Aggregates > ----------------------- > > Key: FLINK-5564 > URL: https://issues.apache.org/jira/browse/FLINK-5564 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL > Reporter: Shaoxuan Wang > Assignee: Shaoxuan Wang > > User-defined aggregates would be a great addition to the Table API / SQL. > The current aggregate interface is not well suited for the external users. > This issue proposes to redesign the aggregate such that we can expose an > better external UDAGG interface to the users. The detailed design proposal > can be found here: > https://docs.google.com/document/d/19JXK8jLIi8IqV9yf7hOs_Oz67yXOypY7Uh5gIOK2r-U/edit > Motivation: > 1. The current aggregate interface is not very concise to the users. One > needs to know the design details of the intermediate Row buffer before > implements an Aggregate. Seven functions are needed even for a simple Count > aggregate. > 2. Another limitation of current aggregate function is that it can only be > applied on one single column. There are many scenarios which require the > aggregate function taking multiple columns as the inputs. > 3. “Retraction” is not considered and covered in the current Aggregate. > 4. It might be very good to have a local/global aggregate query plan > optimization, which is very promising to optimize UDAGG performance in some > scenarios. > Proposed Changes: > 1. Implement an aggregate dataStream API (Done by > [FLINK-5582|https://issues.apache.org/jira/browse/FLINK-5582]) > 2. Update all the existing aggregates to use the new aggregate dataStream API > 3. Provide a better User-Defined Aggregate interface > 4. Add retraction support > 5. Add local/global aggregate -- This message was sent by Atlassian JIRA (v6.3.15#6346)