[jira] [Commented] (FLINK-2142) GSoC project: Exact and Approximate Statistics for Data Streams and Windows
[ https://issues.apache.org/jira/browse/FLINK-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284435#comment-15284435 ] Gabor Gevay commented on FLINK-2142: This proposal was based on the old (pre-0.10) windowing API. I'm now taking it apart, by converting sub-tasks to stand-alone issues (FLINK-2148, FLINK-2147) and/or modifying/closing those sub-tasks that don't make sense in the current streaming API. I will add the label `approximate` to those issues that are about approximate calculations. Note: The main reason why I abandoned this project last summer, is that the streaming API was changing a lot at that time, so it seemed better to postpone these things. > GSoC project: Exact and Approximate Statistics for Data Streams and Windows > --- > > Key: FLINK-2142 > URL: https://issues.apache.org/jira/browse/FLINK-2142 > Project: Flink > Issue Type: New Feature > Components: Streaming >Reporter: Gabor Gevay >Assignee: Gabor Gevay >Priority: Minor > Labels: gsoc2015, statistics, streaming > > The goal of this project is to implement basic statistics of data streams and > windows (like average, median, variance, correlation, etc.) in a > computationally efficient manner. This involves designing custom PreReducers. > The exact calculation of some statistics (eg. frequencies, or the number of > distinct elements) would require memory proportional to the number of > elements in the input (the window or the entire stream). However, there are > efficient algorithms and data structures using less memory for calculating > the same statistics only approximately, with user-specified error bounds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2142) GSoC project: Exact and Approximate Statistics for Data Streams and Windows
[ https://issues.apache.org/jira/browse/FLINK-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15284474#comment-15284474 ] Gabor Gevay commented on FLINK-2142: (I've also broken off FLINK-2144.) > GSoC project: Exact and Approximate Statistics for Data Streams and Windows > --- > > Key: FLINK-2142 > URL: https://issues.apache.org/jira/browse/FLINK-2142 > Project: Flink > Issue Type: New Feature > Components: Streaming >Reporter: Gabor Gevay >Assignee: Gabor Gevay >Priority: Minor > Labels: gsoc2015, statistics, streaming > > The goal of this project is to implement basic statistics of data streams and > windows (like average, median, variance, correlation, etc.) in a > computationally efficient manner. This involves designing custom PreReducers. > The exact calculation of some statistics (eg. frequencies, or the number of > distinct elements) would require memory proportional to the number of > elements in the input (the window or the entire stream). However, there are > efficient algorithms and data structures using less memory for calculating > the same statistics only approximately, with user-specified error bounds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2142) GSoC project: Exact and Approximate Statistics for Data Streams and Windows
[ https://issues.apache.org/jira/browse/FLINK-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570785#comment-14570785 ] Márton Balassi commented on FLINK-2142: --- Thanks, for adding the tickets to track your progress [~ggevay]. > GSoC project: Exact and Approximate Statistics for Data Streams and Windows > --- > > Key: FLINK-2142 > URL: https://issues.apache.org/jira/browse/FLINK-2142 > Project: Flink > Issue Type: New Feature > Components: Streaming >Reporter: Gabor Gevay >Assignee: Gabor Gevay >Priority: Minor > Labels: gsoc2015, statistics, streaming > > The goal of this project is to implement basic statistics of data streams and > windows (like average, median, variance, correlation, etc.) in a > computationally efficient manner. This involves designing custom PreReducers. > The exact calculation of some statistics (eg. frequencies, or the number of > distinct elements) would require memory proportional to the number of > elements in the input (the window or the entire stream). However, there are > efficient algorithms and data structures using less memory for calculating > the same statistics only approximately, with user-specified error bounds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2142) GSoC project: Exact and Approximate Statistics for Data Streams and Windows
[ https://issues.apache.org/jira/browse/FLINK-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389797#comment-17389797 ] Gábor Gévay commented on FLINK-2142: There is a recent paper on this in the meantime: http://www.vldb.org/pvldb/vol14/p1818-poepsel-lemaitre.pdf > GSoC project: Exact and Approximate Statistics for Data Streams and Windows > --- > > Key: FLINK-2142 > URL: https://issues.apache.org/jira/browse/FLINK-2142 > Project: Flink > Issue Type: New Feature > Components: API / DataStream >Reporter: Gábor Gévay >Assignee: Gábor Gévay >Priority: Not a Priority > Labels: gsoc2015, stale-assigned, statistics, streaming > > The goal of this project is to implement basic statistics of data streams and > windows (like average, median, variance, correlation, etc.) in a > computationally efficient manner. This involves designing custom PreReducers. > The exact calculation of some statistics (eg. frequencies, or the number of > distinct elements) would require memory proportional to the number of > elements in the input (the window or the entire stream). However, there are > efficient algorithms and data structures using less memory for calculating > the same statistics only approximately, with user-specified error bounds. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-2142) GSoC project: Exact and Approximate Statistics for Data Streams and Windows
[ https://issues.apache.org/jira/browse/FLINK-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17321764#comment-17321764 ] Flink Jira Bot commented on FLINK-2142: --- This issue and all of its Sub-Tasks have not been updated for 180 days. So, it has been labeled "stale-minor". If you are still affected by this bug or are still interested in this issue, please give an update and remove the label. In 7 days the issue will be closed automatically. > GSoC project: Exact and Approximate Statistics for Data Streams and Windows > --- > > Key: FLINK-2142 > URL: https://issues.apache.org/jira/browse/FLINK-2142 > Project: Flink > Issue Type: New Feature > Components: API / DataStream >Reporter: Gábor Gévay >Assignee: Gábor Gévay >Priority: Minor > Labels: gsoc2015, stale-minor, statistics, streaming > > The goal of this project is to implement basic statistics of data streams and > windows (like average, median, variance, correlation, etc.) in a > computationally efficient manner. This involves designing custom PreReducers. > The exact calculation of some statistics (eg. frequencies, or the number of > distinct elements) would require memory proportional to the number of > elements in the input (the window or the entire stream). However, there are > efficient algorithms and data structures using less memory for calculating > the same statistics only approximately, with user-specified error bounds. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-2142) GSoC project: Exact and Approximate Statistics for Data Streams and Windows
[ https://issues.apache.org/jira/browse/FLINK-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17329573#comment-17329573 ] Flink Jira Bot commented on FLINK-2142: --- This issue is assigned but has not received an update in 7 days so it has been labeled "stale-assigned". If you are still working on the issue, please give an update and remove the label. If you are no longer working on the issue, please unassign so someone else may work on it. In 7 days the issue will be automatically unassigned. > GSoC project: Exact and Approximate Statistics for Data Streams and Windows > --- > > Key: FLINK-2142 > URL: https://issues.apache.org/jira/browse/FLINK-2142 > Project: Flink > Issue Type: New Feature > Components: API / DataStream >Reporter: Gábor Gévay >Assignee: Gábor Gévay >Priority: Minor > Labels: gsoc2015, stale-assigned, stale-minor, statistics, > streaming > > The goal of this project is to implement basic statistics of data streams and > windows (like average, median, variance, correlation, etc.) in a > computationally efficient manner. This involves designing custom PreReducers. > The exact calculation of some statistics (eg. frequencies, or the number of > distinct elements) would require memory proportional to the number of > elements in the input (the window or the entire stream). However, there are > efficient algorithms and data structures using less memory for calculating > the same statistics only approximately, with user-specified error bounds. -- This message was sent by Atlassian Jira (v8.3.4#803005)