[ 
https://issues.apache.org/jira/browse/SPARK-21770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131868#comment-16131868
 ] 

Siddharth Murching commented on SPARK-21770:
--------------------------------------------

Good question:

* Predictions on all-zero input don't change (they remain 0 for 
RandomForestClassifier and DecisionTreeClassifier, which are the only models 
that call normalizeToProbabilitiesInPlace())
* This proposal seeks to make predicted probabilities more interpretable when 
raw model output is all-zero
* Regardless, it currently seems impossible for normalizeToProbabilitiesInPlace 
to ever be called on all-zero input, since that'd mean a DecisionTree leaf node 
had a class count array (raw output) of all zeros.

Specifically, both DecisionTreeClassifier and RandomForestClassifier inherit 
Classifier's [implementation of 
raw2prediction()|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala#L221],
 which just takes an argmax ([preferring earlier maximal 
entries|https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala#L176])
 over the model's output vector. A raw model output of all-equal entries would 
result in a prediction of 0 either way.


> ProbabilisticClassificationModel: Improve normalization of all-zero raw 
> predictions
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-21770
>                 URL: https://issues.apache.org/jira/browse/SPARK-21770
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.3.0
>            Reporter: Siddharth Murching
>            Priority: Minor
>
> Given an n-element raw prediction vector of all-zeros, 
> ProbabilisticClassifierModel.normalizeToProbabilitiesInPlace() should output 
> a probability vector of all-equal 1/n entries



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to