Vincent created SPARK-17055: ------------------------------- Summary: add labelKFold to CrossValidator Key: SPARK-17055 URL: https://issues.apache.org/jira/browse/SPARK-17055 Project: Spark Issue Type: New Feature Components: MLlib Affects Versions: 2.0.0 Reporter: Vincent Priority: Minor
Current CrossValidator only supports k-fold, which randomly divides all the samples in k groups of samples. But in cases when data is gathered from different subjects and we want to avoid over-fitting, we want to hold out samples with certain labels from training data and put them into validation fold, i.e. we want to ensure that the same label is not in both testing and training sets. Mainstream package like Sklearn already supports such cross validation method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org