[jira] [Commented] (FLINK-5564) User Defined Aggregates

Fabian Hueske (JIRA) Sun, 12 Feb 2017 14:09:04 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862999#comment-15862999
 ]


Fabian Hueske commented on FLINK-5564:
--------------------------------------

Hi [~shaoxuan],
My proposal is to add the following classes as the first step:

- the new UDAGG interface.
- aggregation functions that implement the new UDAGG interface which will later 
replace the functions that implement the 
{{org.apache.flink.table.runtime.aggregate.Aggregate}} interface.
- Unit tests which correspond to the tests that extend the 
{{AggregateTestBase}}.

Other than that, no code changes should be made. 
Since these changes will only add code and not modify existing code, all no 
existing functionality and tests should be affected.
Once the first step is done, the new function can be used to reimplement the 
batch and streaming aggregations.

> User Defined Aggregates
> -----------------------
>
>                 Key: FLINK-5564
>                 URL: https://issues.apache.org/jira/browse/FLINK-5564
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>            Reporter: Shaoxuan Wang
>            Assignee: Shaoxuan Wang
>
> User-defined aggregates would be a great addition to the Table API / SQL.
> The current aggregate interface is not well suited for the external users.  
> This issue proposes to redesign the aggregate such that we can expose an 
> better external UDAGG interface to the users. The detailed design proposal 
> can be found here: 
> https://docs.google.com/document/d/19JXK8jLIi8IqV9yf7hOs_Oz67yXOypY7Uh5gIOK2r-U/edit
> Motivation:
> 1. The current aggregate interface is not very concise to the users. One 
> needs to know the design details of the intermediate Row buffer before 
> implements an Aggregate. Seven functions are needed even for a simple Count 
> aggregate.
> 2. Another limitation of current aggregate function is that it can only be 
> applied on one single column. There are many scenarios which require the 
> aggregate function taking multiple columns as the inputs.
> 3. “Retraction” is not considered and covered in the current Aggregate.
> 4. It might be very good to have a local/global aggregate query plan 
> optimization, which is very promising to optimize UDAGG performance in some 
> scenarios.
> Proposed Changes:
> 1. Implement an aggregate dataStream API (Done by 
> [FLINK-5582|https://issues.apache.org/jira/browse/FLINK-5582])
> 2. Update all the existing aggregates to use the new aggregate dataStream API
> 3. Provide a better User-Defined Aggregate interface
> 4. Add retraction support
> 5. Add local/global aggregate



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLINK-5564) User Defined Aggregates

Reply via email to