[jira] [Commented] (SPARK-16235) "evaluateEachIteration" is returning wrong results when calculated for classification model.

Sean Owen (JIRA) Wed, 29 Jun 2016 04:02:01 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15355020#comment-15355020
 ]


Sean Owen commented on SPARK-16235:
-----------------------------------

I take it back -- MSE for probabilities is just Brier score eh? 
https://en.wikipedia.org/wiki/Brier_score At least, it has some value. I agree 
it is directionally right, and large when it should be large, etc. I'm not sure 
for example what loss function it would naturally go with. For example, the 
reason we use log-loss with logistic regression is that it's just what the loss 
function is trying to minimize.

Anyway, here we should either enforce that MSE / MAE _can't_ be used with 
classification, or else accommodate it. CC [~sethah] who added that 
transformation. It seems like we'd have to push that transformation down into 
the loss function in order to resolve this? or was that not possible? it would 
probably be too hacky to extend this to specially divide the MSE result by 4 or 
something to compensate.

> "evaluateEachIteration" is returning wrong results when calculated for 
> classification model.
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-16235
>                 URL: https://issues.apache.org/jira/browse/SPARK-16235
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.6.1, 1.6.2, 2.0.0
>            Reporter: Mahmoud Rawas
>
> Basically within the mentioned function there is a code to map the actual 
> value which supposed to be in the range of \[0,1] into the range of \[-1,1], 
> in order to make it compatible with the predicted value produces by a 
> classification mode. 
> {code}
> val remappedData = algo match {
>       case Classification => data.map(x => new LabeledPoint((x.label * 2) - 
> 1, x.features))
>       case _ => data
>     }
> {code}
> the problem with this approach is the fact that it will calculate an 
> incorrect error for an example mse will be be 4 time larger than the actual 
> expected mse 
> Instead we should map the predicted value into probability value in [0,1].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16235) "evaluateEachIteration" is returning wrong results when calculated for classification model.

Reply via email to