[ https://issues.apache.org/jira/browse/SPARK-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-17057: ------------------------------------ Assignee: Apache Spark > ProbabilisticClassifierModels' prediction more reasonable with multi zero > thresholds > ------------------------------------------------------------------------------------ > > Key: SPARK-17057 > URL: https://issues.apache.org/jira/browse/SPARK-17057 > Project: Spark > Issue Type: Improvement > Components: ML > Reporter: zhengruifeng > Assignee: Apache Spark > > {code} > val path = "./data/mllib/sample_multiclass_classification_data.txt" > val data = spark.read.format("libsvm").load(path) > val rfm = rf.fit(data) > scala> rfm.setThresholds(Array(0.0,0.0,0.0)) > res4: org.apache.spark.ml.classification.RandomForestClassificationModel = > RandomForestClassificationModel (uid=rfc_cbe640b0eccc) with 20 trees > scala> rfm.transform(data).show(5) > +-----+--------------------+--------------+-------------+----------+ > |label| features| rawPrediction| probability|prediction| > +-----+--------------------+--------------+-------------+----------+ > | 1.0|(4,[0,1,2,3],[-0....|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| > | 1.0|(4,[0,1,2,3],[-0....|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| > | 1.0|(4,[0,1,2,3],[-0....|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| > | 1.0|(4,[0,1,2,3],[-0....|[0.0,20.0,0.0]|[0.0,1.0,0.0]| 0.0| > | 0.0|(4,[0,1,2,3],[0.1...|[20.0,0.0,0.0]|[1.0,0.0,0.0]| 0.0| > +-----+--------------------+--------------+-------------+----------+ > only showing top 5 rows > {code} > If multi thresholds are set zero, the prediction of > {{ProbabilisticClassificationModel}} is the first index whose corresponding > threshold is 0. > However, in this case, the index with max {{probability}} among indices with > 0-threshold should be more reasonable to mark as > {{prediction}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org