[ 
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708889#comment-14708889
 ] 

ASF GitHub Bot commented on FLINK-1901:
---------------------------------------

Github user tillrohrmann commented on the pull request:

    https://github.com/apache/flink/pull/949#issuecomment-134071161
  
    @sachingoel0101, you're right. The problem is that Flink does not give you 
a guarantee in which order the elements will arrive. But this problem won't be 
fixed by setting the seed for all sampling operators to the same value. There 
always might be an operator, e.g. rebalance, which will completely randomize 
your element order.


> Create sample operator for Dataset
> ----------------------------------
>
>                 Key: FLINK-1901
>                 URL: https://issues.apache.org/jira/browse/FLINK-1901
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Theodore Vasiloudis
>            Assignee: Chengxiang Li
>
> In order to be able to implement Stochastic Gradient Descent and a number of 
> other machine learning algorithms we need to have a way to take a random 
> sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset, 
> choose the relative or exact size of the sample, set a seed for 
> reproducibility, and support sampling within iterations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to