[ https://issues.apache.org/jira/browse/SPARK-21770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131868#comment-16131868 ]
Siddharth Murching commented on SPARK-21770: -------------------------------------------- Good question: * Predictions on all-zero input don't change (they remain 0 for RandomForestClassifier and DecisionTreeClassifier, which are the only models that call normalizeToProbabilitiesInPlace()) * This proposal seeks to make predicted probabilities more interpretable when raw model output is all-zero * Regardless, it currently seems impossible for normalizeToProbabilitiesInPlace to ever be called on all-zero input, since that'd mean a DecisionTree leaf node had a class count array (raw output) of all zeros. Specifically, both DecisionTreeClassifier and RandomForestClassifier inherit Classifier's [implementation of raw2prediction()|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/Classifier.scala#L221], which just takes an argmax ([preferring earlier maximal entries|https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala#L176]) over the model's output vector. A raw model output of all-equal entries would result in a prediction of 0 either way. > ProbabilisticClassificationModel: Improve normalization of all-zero raw > predictions > ----------------------------------------------------------------------------------- > > Key: SPARK-21770 > URL: https://issues.apache.org/jira/browse/SPARK-21770 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.3.0 > Reporter: Siddharth Murching > Priority: Minor > > Given an n-element raw prediction vector of all-zeros, > ProbabilisticClassifierModel.normalizeToProbabilitiesInPlace() should output > a probability vector of all-equal 1/n entries -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org