[ https://issues.apache.org/jira/browse/SPARK-21806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-21806: ------------------------------------ Assignee: Apache Spark > BinaryClassificationMetrics pr(): first point (0.0, 1.0) is misleading > ---------------------------------------------------------------------- > > Key: SPARK-21806 > URL: https://issues.apache.org/jira/browse/SPARK-21806 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 2.2.0 > Reporter: Marc Kaminski > Assignee: Apache Spark > Priority: Minor > Attachments: PRROC_example.jpeg > > > I would like to reference to a [discussion in scikit-learn| > https://github.com/scikit-learn/scikit-learn/issues/4223], as this behavior > is probably based on the scikit implementation. > Summary: > Currently, the y-axis intercept of the precision recall curve is set to (0.0, > 1.0). This behavior is not ideal in certain edge cases (see example below) > and can also have an impact on cross validation, when optimization metric is > set to "areaUnderPR". > Please consider [blucena's > post|https://github.com/scikit-learn/scikit-learn/issues/4223#issuecomment-215273613] > for possible alternatives. > Edge case example: > Consider a bad classifier, that assigns a high probability to all samples. A > possible output might look like this: > ||Real label || Score || > |1.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 1.0 | > |0.0 | 0.95 | > |0.0 | 0.95 | > |1.0 | 1.0 | > This results in the following pr points (first line set by default): > ||Threshold || Recall ||Precision || > |1.0 | 0.0 | 1.0 | > |0.95| 1.0 | 0.2 | > |0.0| 1.0 | 0,16 | > The auPRC would be around 0.6. Classifiers with a more differentiated > probability assignment will be falsely assumed to perform worse in regard to > this auPRC. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org