[ 
https://issues.apache.org/jira/browse/SPARK-17476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478517#comment-15478517
 ] 

Xin Ren commented on SPARK-17476:
---------------------------------

Hi I can try to work on this one, thanks :)

> Proper handling for unseen labels in logistic regression training.
> ------------------------------------------------------------------
>
>                 Key: SPARK-17476
>                 URL: https://issues.apache.org/jira/browse/SPARK-17476
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>            Reporter: Seth Hendrickson
>
> Now that logistic regression supports multiclass, it is possible to train on 
> data that has {{K}} classes, but one or more of the classes does not appear 
> in training. For example,
> {code}
> (0.0, x1)
> (2.0, x2)
> ...
> {code}
> Currently, logistic regression assumes that the outcome classes in the above 
> dataset have three levels: {{0, 1, 2}}. Since label 1 never appears, it 
> should never be predicted. In theory, the coefficients should be zero and the 
> intercept should be negative infinity. This can cause problems since we 
> center the intercepts after training.
> We should discuss whether or not the intercepts actually tend to -infinity in 
> practice, and whether or not we should even include them in training. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to