[ 
https://issues.apache.org/jira/browse/FLINK-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Gevay updated FLINK-2142:
-------------------------------
    Description: 
The goal of this project is to implement basic statistics of data streams and 
windows (like average, median, variance, correlation, etc.) in a 
computationally efficient manner. This involves designing custom PreReducers.

The exact calculation of some statistics (eg. frequencies, or the number of 
distinct elements) would require memory proportional to the number of elements 
in the input (the window or the entire stream). However, there are efficient 
algorithms and data structures using less memory for calculating the same 
statistics only approximately, with user-specified error bounds.

  was:
The goal of this project is to implement basic statistics of data streams and 
windows (like average, median, variance, correlation, etc.) in a 
computationally efficient manner. This involves designing custom preaggregators.

The exact calculation of some statistics (eg. frequencies, or the number of 
distinct elements) would require memory proportional to the number of elements 
in the input (the window or the entire stream). However, there are efficient 
algorithms and data structures using less memory for calculating the same 
statistics only approximately, with user-specified error bounds.


> GSoC project: Exact and Approximate Statistics for Data Streams and Windows
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-2142
>                 URL: https://issues.apache.org/jira/browse/FLINK-2142
>             Project: Flink
>          Issue Type: New Feature
>          Components: Streaming
>            Reporter: Gabor Gevay
>            Assignee: Gabor Gevay
>            Priority: Minor
>              Labels: gsoc2015, statistics, streaming
>
> The goal of this project is to implement basic statistics of data streams and 
> windows (like average, median, variance, correlation, etc.) in a 
> computationally efficient manner. This involves designing custom PreReducers.
> The exact calculation of some statistics (eg. frequencies, or the number of 
> distinct elements) would require memory proportional to the number of 
> elements in the input (the window or the entire stream). However, there are 
> efficient algorithms and data structures using less memory for calculating 
> the same statistics only approximately, with user-specified error bounds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to