does spark ML have some thing like createDataPartition() in R caret package ?

Andy Davidson Fri, 13 Nov 2015 13:22:31 -0800

In R, its easy to split a data set into training, crossValidation, and test
set. Is there something like this in spark.ml? I am using python of now.


My real problem is I want to randomly select a relatively small data set to
do some initial data exploration. Its not clear to me how using spark I
could create a random sample from a large data set. I would prefer to sample
with out replacement.

I have not tried to use sparkR yet. I assume I would not be able to use the
caret package with spark ML

Kind regards

Andy

```{R}
   inTrain <- createDataPartition(y=csv$classe, p=0.7, list=FALSE)
    trainSetDF <- csv[inTrain,]
    testSetDF <- csv[-inTrain,]
```

does spark ML have some thing like createDataPartition() in R caret package ?

Reply via email to