[ https://issues.apache.org/jira/browse/FLINK-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633118#comment-14633118 ]
Maximilian Alber commented on FLINK-2312: ----------------------------------------- I agree too. Something else: How do you ensure the ratios? As I see they are only approximately ensured when you have a big number of samples. On Mon, Jul 20, 2015 at 9:26 AM, ASF GitHub Bot (JIRA) <j...@apache.org> > Random Splits > ------------- > > Key: FLINK-2312 > URL: https://issues.apache.org/jira/browse/FLINK-2312 > Project: Flink > Issue Type: Wish > Components: Machine Learning Library > Reporter: Maximilian Alber > Assignee: pietro pinoli > Priority: Minor > > In machine learning applications it is common to split data sets into f.e. > training and testing set. > To the best of my knowledge there is at the moment no nice way in Flink to > split a data set randomly into several partitions according to some ratio. > The wished semantic would be the same as of Sparks RDD randomSplit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)