[
https://issues.apache.org/jira/browse/FLINK-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708701#comment-14708701
]
ASF GitHub Bot commented on FLINK-1901:
---------------------------------------
Github user chiwanpark commented on the pull request:
https://github.com/apache/flink/pull/949#issuecomment-133999705
@ChengXiangLi I know that it is hard to verify random sampler
implementation. But we need to fix this test failing because of difficulty of
other pull requests verification. Some tests of other pull requests are failed
by K-S test and sampling test with fraction. There is a [JIRA
issue](https://issues.apache.org/jira/browse/FLINK-2564) covered this.
I'm testing with increased count of samples and source size. If I get a
notable result, I'll post the result.
> Create sample operator for Dataset
> ----------------------------------
>
> Key: FLINK-1901
> URL: https://issues.apache.org/jira/browse/FLINK-1901
> Project: Flink
> Issue Type: Improvement
> Components: Core
> Reporter: Theodore Vasiloudis
> Assignee: Chengxiang Li
>
> In order to be able to implement Stochastic Gradient Descent and a number of
> other machine learning algorithms we need to have a way to take a random
> sample from a Dataset.
> We need to be able to sample with or without replacement from the Dataset,
> choose the relative or exact size of the sample, set a seed for
> reproducibility, and support sampling within iterations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)