Shaoxuan Wang created FLINK-5564:
------------------------------------

             Summary: User Defined Aggregates
                 Key: FLINK-5564
                 URL: https://issues.apache.org/jira/browse/FLINK-5564
             Project: Flink
          Issue Type: Improvement
          Components: Table API & SQL
            Reporter: Shaoxuan Wang


User-defined aggregates would be a great addition to the Table API / SQL.
The current aggregate interface is not well suited for the external users.  
This issue proposes to redesign the aggregate such that we can expose an better 
external UDAGG interface to the users. The detailed design proposal can be 
found here: 
https://docs.google.com/document/d/19JXK8jLIi8IqV9yf7hOs_Oz67yXOypY7Uh5gIOK2r-U/edit

Motivation:
1. The current aggregate interface is not very concise to the users. One needs 
to know the design details of the intermediate Row buffer before implements an 
Aggregate. Seven functions are needed even for a simple Count aggregate.
2. Another limitation of current aggregate function is that it can only be 
applied on one single column. There are many scenarios which require the 
aggregate function taking multiple columns as the inputs.
3. “Retraction” is not considered and covered in the current Aggregate.
4. It might be very good to have a local/global aggregate query plan 
optimization, which is very promising to optimize UDAGG performance in some 
scenarios.

Proposed Changes:
1. Implement an aggregate dataStream API
2. Update all the existing aggregates to use the new aggregate dataStream API
3. Provide a better User-Design Aggregate interface
4. Add retraction support
5. Add local/global aggregate



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to