[jira] [Commented] (FLINK-3613) Add standard deviation to list of Aggregations

Todd Lisonbee (JIRA) Mon, 14 Mar 2016 09:03:05 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193522#comment-15193522
 ]


Todd Lisonbee commented on FLINK-3613:
--------------------------------------

Hello, I'm new to Apache Flink and would like to contribute some code.  A 
standard deviation aggregation seemed like an easy place to start.

I did a quick search and didn't see anyone already working on this.

A team mate of mine implemented something similar to what I believe is needed 
against Apache Spark here,
https://github.com/trustedanalytics/atk/blob/master/engine-plugins/frame-plugins/src/main/scala/org/trustedanalytics/atk/engine/frame/plugins/groupby/aggregators/VarianceAggregator.scala

I was going to write a fresh implementation for Flink - unless someone stops me.

Thanks!


> Add standard deviation to list of Aggregations
> ----------------------------------------------
>
>                 Key: FLINK-3613
>                 URL: https://issues.apache.org/jira/browse/FLINK-3613
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Todd Lisonbee
>            Priority: Minor
>
> Implement Standard Deviation for 
> org.apache.flink.api.java.aggregation.Aggregations
> Ideally implementation should be single pass and numerically stable.
> References:
> "Scalable and Numerically Stable Descriptive Statistics in SystemML", Tian et 
> al, International Conference on Data Engineering 2012
> http://dl.acm.org/citation.cfm?id=2310392
> "The Kahan summation algorithm (also known as compensated summation) reduces 
> the numerical errors that occur when adding a sequence of finite precision 
> floating point numbers. Numerical errors arise due to truncation and 
> rounding. These errors can lead to numerical instability when calculating 
> variance."
> https://en.wikipedia.org/wiki/Kahan_summation_algorithm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-3613) Add standard deviation to list of Aggregations

Reply via email to