In R, its easy to split a data set into training, crossValidation, and test set. Is there something like this in spark.ml? I am using python of now.
My real problem is I want to randomly select a relatively small data set to do some initial data exploration. Its not clear to me how using spark I could create a random sample from a large data set. I would prefer to sample with out replacement. I have not tried to use sparkR yet. I assume I would not be able to use the caret package with spark ML Kind regards Andy ```{R} inTrain <- createDataPartition(y=csv$classe, p=0.7, list=FALSE) trainSetDF <- csv[inTrain,] testSetDF <- csv[-inTrain,] ```