[
https://issues.apache.org/jira/browse/FLINK-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14269546#comment-14269546
]
Paris Carbone edited comment on FLINK-1284 at 1/8/15 4:57 PM:
--------------------------------------------------------------
Hi Cao. That is right, 'sample' will be a function under WindowedDataStream.
The window operator gets as a parameter a policy function (there are helpers
for the policies, check the examples) that defines which tuples are included in
each window. This is always followed by an 'every' operator which again takes a
policy as a parameter that defines the tuples that signify the next window. You
can think of each window as a buffer upon which the next operation will run
(eg. sample). You can read more at the streaming guide under docs. If you want
to give it a try I can assign it to you.
was (Author: senorcarbone):
Hi Cao. That is right, 'sample' will be a function under WindowedDataStream.
The window operator gets as a parameter a Policy function (there are helpers
for the policies, check the examples) that defines which tuples are included in
each window. This is always followed by an 'every' operator which again takes a
policy as a parameter that defines the tuples that signify the next window. You
can think of each window as a buffer upon which the next operation will run
(eg. sample). You can read more at the streaming guide under docs. If you want
to give it a try I can assign it to you.
> Uniform random sampling operator over windows
> ---------------------------------------------
>
> Key: FLINK-1284
> URL: https://issues.apache.org/jira/browse/FLINK-1284
> Project: Flink
> Issue Type: New Feature
> Components: Streaming
> Reporter: Paris Carbone
> Priority: Minor
>
> It would be useful for several use cases to have a built-in uniform random
> sampling operator in the streaming API that can operate on windows. This can
> be used for example for online machine learning operations, evaluating
> heuristics or continuous visualisation of representative values.
> The operator could be given a field and a number of random samples needed,
> following a window statement as such:
> mystream.window(..).sample(fieldID,#samples)
> Given that pre-aggregation is enabled, this could perhaps be implemented as a
> binary reduce operator or a combinable groupreduce that pre-aggregates the
> empiricals of that field.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)