[ 
https://issues.apache.org/jira/browse/FLINK-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652113#comment-15652113
 ] 

Fabian Hueske commented on FLINK-3613:
--------------------------------------

Hi [~anmu], 
this issue proposes to add more built-in aggregation functions to the DataSet 
API. 
Since parts of the Table API are built on the DataSet API, such a feature could 
in principle be used to implement for instance also stddev for batch tables.

However, this would only help for batch tables so we would also need an 
implementation for streaming tables. Also, there are quite a few challenges 
when implementing these aggregation functions for the DataSet API. I think 
Stephan had a good point, when he asked whether these advanced functions would 
be better suited for the Table API which FLINK-4604 is all about.

So, I would rather opt to close this issue in favor of FLINK-4604.

> Add standard deviation, mean, variance to list of Aggregations
> --------------------------------------------------------------
>
>                 Key: FLINK-3613
>                 URL: https://issues.apache.org/jira/browse/FLINK-3613
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Todd Lisonbee
>            Priority: Minor
>         Attachments: DataSet-Aggregation-Design-March2016-v1.txt
>
>
> Implement standard deviation, mean, variance for 
> org.apache.flink.api.java.aggregation.Aggregations
> Ideally implementation should be single pass and numerically stable.
> References:
> "Scalable and Numerically Stable Descriptive Statistics in SystemML", Tian et 
> al, International Conference on Data Engineering 2012
> http://dl.acm.org/citation.cfm?id=2310392
> "The Kahan summation algorithm (also known as compensated summation) reduces 
> the numerical errors that occur when adding a sequence of finite precision 
> floating point numbers. Numerical errors arise due to truncation and 
> rounding. These errors can lead to numerical instability when calculating 
> variance."
> https://en.wikipedia.org/wiki/Kahan_summation_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to