[ 
https://issues.apache.org/jira/browse/SPARK-17055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15625108#comment-15625108
 ] 

Vincent commented on SPARK-17055:
---------------------------------

[~sowen] Okay. hmm, I guess we have some misunderstanding here. 
[~remi.delas...@gmail.com] reviewed the code and gave some feedbacks that we 
should be prepared to accommodate to other folding methods if there is any. But 
my opinion was that we align with current design in MLLIB, becoz: 1. we dont 
have that many folding methods so far in MLLIB; 2. Changing the API as Remi 
proposed will have impact on current cross-validation usage, which I think it'd 
be better get the nod from committers.
So, I suggest reopen this ticket coz as we know this folding method is useful 
in practice, and ,as from what I have learned, some ppl in the community 
actually need/use this PR when they use Spark ML. :)

> add labelKFold to CrossValidator
> --------------------------------
>
>                 Key: SPARK-17055
>                 URL: https://issues.apache.org/jira/browse/SPARK-17055
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Vincent
>            Priority: Minor
>
> Current CrossValidator only supports k-fold, which randomly divides all the 
> samples in k groups of samples. But in cases when data is gathered from 
> different subjects and we want to avoid over-fitting, we want to hold out 
> samples with certain labels from training data and put them into validation 
> fold, i.e. we want to ensure that the same label is not in both testing and 
> training sets.
> Mainstream packages like Sklearn already supports such cross validation 
> method. 
> (http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.LabelKFold.html#sklearn.cross_validation.LabelKFold)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to