[jira] [Commented] (SPARK-18844) Add more binary classification metrics to BinaryClassificationMetrics

2018-03-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382719#comment-16382719
 ] 

Apache Spark commented on SPARK-18844:
--

User 'sandecho' has created a pull request for this issue:
https://github.com/apache/spark/pull/20709

> Add more binary classification metrics to BinaryClassificationMetrics
> -
>
> Key: SPARK-18844
> URL: https://issues.apache.org/jira/browse/SPARK-18844
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.0.2
>Reporter: Zak Patterson
>Priority: Minor
>  Labels: evaluation
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> BinaryClassificationMetrics only implements Precision (positive predictive 
> value) and recall (true positive rate). It should implement more 
> comprehensive metrics.
> Moreover, the instance variables storing computed counts are marked private, 
> and there are no accessors for them. So if one desired to add this 
> functionality, one would have to duplicate this calculation, which is not 
> trivial:
> https://github.com/apache/spark/blob/v2.0.2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala#L144
> Currently Implemented Metrics
> ---
> * Precision (PPV): `precisionByThreshold`
> * Recall (Sensitivity, true positive rate): `recallByThreshold`
> Desired additional metrics
> ---
> * False omission rate: `forByThreshold`
> * False discovery rate: `fdrByThreshold`
> * Negative predictive value: `npvByThreshold`
> * False negative rate: `fnrByThreshold`
> * True negative rate (Specificity): `specificityByThreshold`
> * False positive rate: `fprByThreshold`
> Alternatives
> ---
> The `createCurve` method is marked private. If it were marked public, and the 
> trait BinaryClassificationMetricComputer were also marked public, then it 
> would be easy to define new computers to get whatever the user wanted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18844) Add more binary classification metrics to BinaryClassificationMetrics

2018-02-14 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364175#comment-16364175
 ] 

Apache Spark commented on SPARK-18844:
--

User 'sandecho' has created a pull request for this issue:
https://github.com/apache/spark/pull/20609

> Add more binary classification metrics to BinaryClassificationMetrics
> -
>
> Key: SPARK-18844
> URL: https://issues.apache.org/jira/browse/SPARK-18844
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.0.2
>Reporter: Zak Patterson
>Priority: Minor
>  Labels: evaluation
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> BinaryClassificationMetrics only implements Precision (positive predictive 
> value) and recall (true positive rate). It should implement more 
> comprehensive metrics.
> Moreover, the instance variables storing computed counts are marked private, 
> and there are no accessors for them. So if one desired to add this 
> functionality, one would have to duplicate this calculation, which is not 
> trivial:
> https://github.com/apache/spark/blob/v2.0.2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala#L144
> Currently Implemented Metrics
> ---
> * Precision (PPV): `precisionByThreshold`
> * Recall (Sensitivity, true positive rate): `recallByThreshold`
> Desired additional metrics
> ---
> * False omission rate: `forByThreshold`
> * False discovery rate: `fdrByThreshold`
> * Negative predictive value: `npvByThreshold`
> * False negative rate: `fnrByThreshold`
> * True negative rate (Specificity): `specificityByThreshold`
> * False positive rate: `fprByThreshold`
> Alternatives
> ---
> The `createCurve` method is marked private. If it were marked public, and the 
> trait BinaryClassificationMetricComputer were also marked public, then it 
> would be easy to define new computers to get whatever the user wanted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18844) Add more binary classification metrics to BinaryClassificationMetrics

2018-02-08 Thread Sandeep Kumar Choudhary (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357281#comment-16357281
 ] 

Sandeep Kumar Choudhary commented on SPARK-18844:
-

I have submitted the patch. It is now okay to test.

> Add more binary classification metrics to BinaryClassificationMetrics
> -
>
> Key: SPARK-18844
> URL: https://issues.apache.org/jira/browse/SPARK-18844
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.0.2
>Reporter: Zak Patterson
>Priority: Minor
>  Labels: evaluation
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> BinaryClassificationMetrics only implements Precision (positive predictive 
> value) and recall (true positive rate). It should implement more 
> comprehensive metrics.
> Moreover, the instance variables storing computed counts are marked private, 
> and there are no accessors for them. So if one desired to add this 
> functionality, one would have to duplicate this calculation, which is not 
> trivial:
> https://github.com/apache/spark/blob/v2.0.2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala#L144
> Currently Implemented Metrics
> ---
> * Precision (PPV): `precisionByThreshold`
> * Recall (Sensitivity, true positive rate): `recallByThreshold`
> Desired additional metrics
> ---
> * False omission rate: `forByThreshold`
> * False discovery rate: `fdrByThreshold`
> * Negative predictive value: `npvByThreshold`
> * False negative rate: `fnrByThreshold`
> * True negative rate (Specificity): `specificityByThreshold`
> * False positive rate: `fprByThreshold`
> Alternatives
> ---
> The `createCurve` method is marked private. If it were marked public, and the 
> trait BinaryClassificationMetricComputer were also marked public, then it 
> would be easy to define new computers to get whatever the user wanted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18844) Add more binary classification metrics to BinaryClassificationMetrics

2018-02-08 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357256#comment-16357256
 ] 

Apache Spark commented on SPARK-18844:
--

User 'sandecho' has created a pull request for this issue:
https://github.com/apache/spark/pull/20549

> Add more binary classification metrics to BinaryClassificationMetrics
> -
>
> Key: SPARK-18844
> URL: https://issues.apache.org/jira/browse/SPARK-18844
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.0.2
>Reporter: Zak Patterson
>Priority: Minor
>  Labels: evaluation
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> BinaryClassificationMetrics only implements Precision (positive predictive 
> value) and recall (true positive rate). It should implement more 
> comprehensive metrics.
> Moreover, the instance variables storing computed counts are marked private, 
> and there are no accessors for them. So if one desired to add this 
> functionality, one would have to duplicate this calculation, which is not 
> trivial:
> https://github.com/apache/spark/blob/v2.0.2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala#L144
> Currently Implemented Metrics
> ---
> * Precision (PPV): `precisionByThreshold`
> * Recall (Sensitivity, true positive rate): `recallByThreshold`
> Desired additional metrics
> ---
> * False omission rate: `forByThreshold`
> * False discovery rate: `fdrByThreshold`
> * Negative predictive value: `npvByThreshold`
> * False negative rate: `fnrByThreshold`
> * True negative rate (Specificity): `specificityByThreshold`
> * False positive rate: `fprByThreshold`
> Alternatives
> ---
> The `createCurve` method is marked private. If it were marked public, and the 
> trait BinaryClassificationMetricComputer were also marked public, then it 
> would be easy to define new computers to get whatever the user wanted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18844) Add more binary classification metrics to BinaryClassificationMetrics

2018-01-03 Thread Sandeep Kumar Choudhary (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310776#comment-16310776
 ] 

Sandeep Kumar Choudhary commented on SPARK-18844:
-

I am working on this JIRA.

> Add more binary classification metrics to BinaryClassificationMetrics
> -
>
> Key: SPARK-18844
> URL: https://issues.apache.org/jira/browse/SPARK-18844
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.0.2
>Reporter: Zak Patterson
>Priority: Minor
>  Labels: evaluation
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> BinaryClassificationMetrics only implements Precision (positive predictive 
> value) and recall (true positive rate). It should implement more 
> comprehensive metrics.
> Moreover, the instance variables storing computed counts are marked private, 
> and there are no accessors for them. So if one desired to add this 
> functionality, one would have to duplicate this calculation, which is not 
> trivial:
> https://github.com/apache/spark/blob/v2.0.2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala#L144
> Currently Implemented Metrics
> ---
> * Precision (PPV): `precisionByThreshold`
> * Recall (Sensitivity, true positive rate): `recallByThreshold`
> Desired additional metrics
> ---
> * False omission rate: `forByThreshold`
> * False discovery rate: `fdrByThreshold`
> * Negative predictive value: `npvByThreshold`
> * False negative rate: `fnrByThreshold`
> * True negative rate (Specificity): `specificityByThreshold`
> * False positive rate: `fprByThreshold`
> Alternatives
> ---
> The `createCurve` method is marked private. If it were marked public, and the 
> trait BinaryClassificationMetricComputer were also marked public, then it 
> would be easy to define new computers to get whatever the user wanted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18844) Add more binary classification metrics to BinaryClassificationMetrics

2017-12-24 Thread Sandeep Kumar Choudhary (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16303067#comment-16303067
 ] 

Sandeep Kumar Choudhary commented on SPARK-18844:
-

How can I get this task assigned to me. Could you please help?

> Add more binary classification metrics to BinaryClassificationMetrics
> -
>
> Key: SPARK-18844
> URL: https://issues.apache.org/jira/browse/SPARK-18844
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.0.2
>Reporter: Zak Patterson
>Priority: Minor
>  Labels: evaluation
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> BinaryClassificationMetrics only implements Precision (positive predictive 
> value) and recall (true positive rate). It should implement more 
> comprehensive metrics.
> Moreover, the instance variables storing computed counts are marked private, 
> and there are no accessors for them. So if one desired to add this 
> functionality, one would have to duplicate this calculation, which is not 
> trivial:
> https://github.com/apache/spark/blob/v2.0.2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala#L144
> Currently Implemented Metrics
> ---
> * Precision (PPV): `precisionByThreshold`
> * Recall (Sensitivity, true positive rate): `recallByThreshold`
> Desired additional metrics
> ---
> * False omission rate: `forByThreshold`
> * False discovery rate: `fdrByThreshold`
> * Negative predictive value: `npvByThreshold`
> * False negative rate: `fnrByThreshold`
> * True negative rate (Specificity): `specificityByThreshold`
> * False positive rate: `fprByThreshold`
> Alternatives
> ---
> The `createCurve` method is marked private. If it were marked public, and the 
> trait BinaryClassificationMetricComputer were also marked public, then it 
> would be easy to define new computers to get whatever the user wanted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18844) Add more binary classification metrics to BinaryClassificationMetrics

2017-12-24 Thread Sandeep Kumar Choudhary (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16303066#comment-16303066
 ] 

Sandeep Kumar Choudhary commented on SPARK-18844:
-

I want to work on this JIRA. I have figured it out how to do. 

> Add more binary classification metrics to BinaryClassificationMetrics
> -
>
> Key: SPARK-18844
> URL: https://issues.apache.org/jira/browse/SPARK-18844
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.0.2
>Reporter: Zak Patterson
>Priority: Minor
>  Labels: evaluation
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> BinaryClassificationMetrics only implements Precision (positive predictive 
> value) and recall (true positive rate). It should implement more 
> comprehensive metrics.
> Moreover, the instance variables storing computed counts are marked private, 
> and there are no accessors for them. So if one desired to add this 
> functionality, one would have to duplicate this calculation, which is not 
> trivial:
> https://github.com/apache/spark/blob/v2.0.2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala#L144
> Currently Implemented Metrics
> ---
> * Precision (PPV): `precisionByThreshold`
> * Recall (Sensitivity, true positive rate): `recallByThreshold`
> Desired additional metrics
> ---
> * False omission rate: `forByThreshold`
> * False discovery rate: `fdrByThreshold`
> * Negative predictive value: `npvByThreshold`
> * False negative rate: `fnrByThreshold`
> * True negative rate (Specificity): `specificityByThreshold`
> * False positive rate: `fprByThreshold`
> Alternatives
> ---
> The `createCurve` method is marked private. If it were marked public, and the 
> trait BinaryClassificationMetricComputer were also marked public, then it 
> would be easy to define new computers to get whatever the user wanted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18844) Add more binary classification metrics to BinaryClassificationMetrics

2016-12-15 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15752864#comment-15752864
 ] 

Joseph K. Bradley commented on SPARK-18844:
---

Note: Please don't set the Target Version or Fix Version.  Committers use those 
to track releases.  Thanks!

> Add more binary classification metrics to BinaryClassificationMetrics
> -
>
> Key: SPARK-18844
> URL: https://issues.apache.org/jira/browse/SPARK-18844
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.0.2
>Reporter: Zak Patterson
>Priority: Minor
>  Labels: evaluation
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> BinaryClassificationMetrics only implements Precision (positive predictive 
> value) and recall (true positive rate). It should implement more 
> comprehensive metrics.
> Moreover, the instance variables storing computed counts are marked private, 
> and there are no accessors for them. So if one desired to add this 
> functionality, one would have to duplicate this calculation, which is not 
> trivial:
> https://github.com/apache/spark/blob/v2.0.2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala#L144
> Currently Implemented Metrics
> ---
> * Precision (PPV): `precisionByThreshold`
> * Recall (Sensitivity, true positive rate): `recallByThreshold`
> Desired additional metrics
> ---
> * False omission rate: `forByThreshold`
> * False discovery rate: `fdrByThreshold`
> * Negative predictive value: `npvByThreshold`
> * False negative rate: `fnrByThreshold`
> * True negative rate (Specificity): `specificityByThreshold`
> * False positive rate: `fprByThreshold`
> Alternatives
> ---
> The `createCurve` method is marked private. If it were marked public, and the 
> trait BinaryClassificationMetricComputer were also marked public, then it 
> would be easy to define new computers to get whatever the user wanted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18844) Add more binary classification metrics to BinaryClassificationMetrics

2016-12-13 Thread Zak Patterson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746293#comment-15746293
 ] 

Zak Patterson commented on SPARK-18844:
---

I'm not familiar with the python API much, but it seems to me that the two 
methods available for scala (precision and recall) are not available in python? 
https://github.com/apache/spark/blob/v2.1.0-rc2/python/pyspark/mllib/evaluation.py#L29

> Add more binary classification metrics to BinaryClassificationMetrics
> -
>
> Key: SPARK-18844
> URL: https://issues.apache.org/jira/browse/SPARK-18844
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.0.2
>Reporter: Zak Patterson
>Priority: Minor
>  Labels: evaluation
> Fix For: 2.0.2
>
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> BinaryClassificationMetrics only implements Precision (positive predictive 
> value) and recall (true positive rate). It should implement more 
> comprehensive metrics.
> Moreover, the instance variables storing computed counts are marked private, 
> and there are no accessors for them. So if one desired to add this 
> functionality, one would have to duplicate this calculation, which is not 
> trivial:
> https://github.com/apache/spark/blob/v2.0.2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala#L144
> Currently Implemented Metrics
> ---
> * Precision (PPV): `precisionByThreshold`
> * Recall (Sensitivity, true positive rate): `recallByThreshold`
> Desired additional metrics
> ---
> * False omission rate: `forByThreshold`
> * False discovery rate: `fdrByThreshold`
> * Negative predictive value: `npvByThreshold`
> * False negative rate: `fnrByThreshold`
> * True negative rate (Specificity): `specificityByThreshold`
> * False positive rate: `fprByThreshold`
> Alternatives
> ---
> The `createCurve` method is marked private. If it were marked public, and the 
> trait BinaryClassificationMetricComputer were also marked public, then it 
> would be easy to define new computers to get whatever the user wanted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18844) Add more binary classification metrics to BinaryClassificationMetrics

2016-12-13 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746212#comment-15746212
 ] 

Sean Owen commented on SPARK-18844:
---

Yeah I think we discussed something like this before, and the drawback was just 
filling up the API with variations that are mostly not used. False positive and 
specificity might see some use. This would have to go in MulticlassMetrics too, 
and the Python API of both, for completeness.

Still that doesn't mean it's not doable. I tend to agree that it makes sense if 
anything to expose the 'computer' API, but then it's not clear how to translate 
that to multiclass and Python.

> Add more binary classification metrics to BinaryClassificationMetrics
> -
>
> Key: SPARK-18844
> URL: https://issues.apache.org/jira/browse/SPARK-18844
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.0.2
>Reporter: Zak Patterson
>Priority: Minor
>  Labels: evaluation
> Fix For: 2.0.2
>
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> BinaryClassificationMetrics only implements Precision (positive predictive 
> value) and recall (true positive rate). It should implement more 
> comprehensive metrics.
> Moreover, the instance variables storing computed counts are marked private, 
> and there are no accessors for them. So if one desired to add this 
> functionality, one would have to duplicate this calculation, which is not 
> trivial:
> https://github.com/apache/spark/blob/v2.0.2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala#L144
> Currently Implemented Metrics
> ---
> * Precision (PPV): `precisionByThreshold`
> * Recall (Sensitivity, true positive rate): `recallByThreshold`
> Desired additional metrics
> ---
> * False omission rate: `forByThreshold`
> * False discovery rate: `fdrByThreshold`
> * Negative predictive value: `npvByThreshold`
> * False negative rate: `fnrByThreshold`
> * True negative rate (Specificity): `specificityByThreshold`
> * False positive rate: `fprByThreshold`
> Alternatives
> ---
> The `createCurve` method is marked private. If it were marked public, and the 
> trait BinaryClassificationMetricComputer were also marked public, then it 
> would be easy to define new computers to get whatever the user wanted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org