[ 
https://issues.apache.org/jira/browse/SPARK-18844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zak Patterson updated SPARK-18844:
----------------------------------
    Description: 
BinaryClassificationMetrics only implements Precision (positive predictive 
value) and recall (true positive rate). It should implement more comprehensive 
metrics.

Moreover, the instance variables storing computed counts are marked private, 
and there are no accessors for them. So if one desired to add this 
functionality, one would have to duplicate this calculation, which is not 
trivial:

https://github.com/apache/spark/blob/v2.0.2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala#L144

Currently Implemented Metrics
---
* Precision (PPV): `precisionByThreshold`
* Recall (Sensitivity, true positive rate): `recallByThreshold`

Desired additional metrics
---
* False omission rate: `forByThreshold`
* False discovery rate: `fdrByThreshold`
* Negative predictive value: `npvByThreshold`
* False negative rate: `fnrByThreshold`
* True negative rate (Specificity): `specificityByThreshold`
* False positive rate: `fprByThreshold`



Alternatives
---

The `createCurve` method is marked private. If it were marked public, and the 
trait BinaryClassificationMetricComputer were also marked public, then it would 
be easy to define new computers to get whatever the user wanted.

  was:
BinaryClassificationMetrics only implements Precision (positive predictive 
value) and recall (true positive rate). It should implement more comprehensive 
metrics.

Moreover, the instance variables storing computed counts are marked private, 
and there are no accessors for them. So if one desired to add this 
functionality, one would have to duplicate this calculation, which is not 
trivial:

https://github.com/apache/spark/blob/v2.0.2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala#L144

Currently Implemented Metrics
---
* Precision (PPV): `precisionByThreshold`
* Recall (Sensitivity, true positive rate): `recallByThreshold`

Desired additional metrics
---
* False omission rate: `fprByThreshold`
* False discovery rate: `fdrByThreshold`
* Negative predictive value: `npvByThreshold`
* False negative rate: `fnrByThreshold`
* True negative rate (Specificity): `specificityByThreshold`
* False positive rate: `fprByThreshold`



Alternatives
---

The `createCurve` method is marked private. If it were marked public, and the 
trait BinaryClassificationMetricComputer were also marked public, then it would 
be easy to define new computers to get whatever the user wanted.


> Add more binary classification metrics to BinaryClassificationMetrics
> ---------------------------------------------------------------------
>
>                 Key: SPARK-18844
>                 URL: https://issues.apache.org/jira/browse/SPARK-18844
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 2.0.2
>            Reporter: Zak Patterson
>            Priority: Minor
>              Labels: evaluation
>             Fix For: 2.0.2
>
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> BinaryClassificationMetrics only implements Precision (positive predictive 
> value) and recall (true positive rate). It should implement more 
> comprehensive metrics.
> Moreover, the instance variables storing computed counts are marked private, 
> and there are no accessors for them. So if one desired to add this 
> functionality, one would have to duplicate this calculation, which is not 
> trivial:
> https://github.com/apache/spark/blob/v2.0.2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala#L144
> Currently Implemented Metrics
> ---
> * Precision (PPV): `precisionByThreshold`
> * Recall (Sensitivity, true positive rate): `recallByThreshold`
> Desired additional metrics
> ---
> * False omission rate: `forByThreshold`
> * False discovery rate: `fdrByThreshold`
> * Negative predictive value: `npvByThreshold`
> * False negative rate: `fnrByThreshold`
> * True negative rate (Specificity): `specificityByThreshold`
> * False positive rate: `fprByThreshold`
> Alternatives
> ---
> The `createCurve` method is marked private. If it were marked public, and the 
> trait BinaryClassificationMetricComputer were also marked public, then it 
> would be easy to define new computers to get whatever the user wanted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to