Github user mengxr commented on the pull request: https://github.com/apache/incubator-spark/pull/572#issuecomment-34668194 @holdenk , the PartitionwiseSampledRDD was designed with this use case in mind. Both the folded RDD and its complement can be represented by PartitionwiseSampledRDD with BernoulliSamplers. Do you mind modifying your code to use it? Also, cross-validation is a machine learning specific operation. spark.rdd.RDD may not be a good place for it.
- [GitHub] incubator-spark pull request: MLI-2: Add k-fold cro... holdenk
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... AmplabJenkins
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... rxin
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... AmplabJenkins
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... AmplabJenkins
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... AmplabJenkins
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... AmplabJenkins
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... mengxr
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... holdenk
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... AmplabJenkins
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... AmplabJenkins
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... AmplabJenkins
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... AmplabJenkins
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... AmplabJenkins
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... AmplabJenkins
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... AmplabJenkins
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... AmplabJenkins
- [GitHub] incubator-spark pull request: MLI-2: Add k-fol... AmplabJenkins