[jira] [Commented] (SPARK-24431) wrong areaUnderPR calculation in BinaryClassificationEvaluator

Xinyong Tian (JIRA) Wed, 06 Jun 2018 20:48:08 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-24431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504202#comment-16504202
 ]


Xinyong Tian commented on SPARK-24431:
--------------------------------------

I also feel it is reasonable to set first point as (0,p). In fact, as long as 
it is not (0,1), aucPR will be small enough for a model that predicts same p 
for all examples, so cross validation will not select such model.

> wrong areaUnderPR calculation in BinaryClassificationEvaluator 
> ---------------------------------------------------------------
>
>                 Key: SPARK-24431
>                 URL: https://issues.apache.org/jira/browse/SPARK-24431
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Xinyong Tian
>            Priority: Major
>
> My problem, I am using CrossValidator(estimator=LogisticRegression(...), ..., 
>  evaluator=BinaryClassificationEvaluator(metricName='areaUnderPR'))  to 
> select best model. when the regParam in logistict regression is very high, no 
> variable will be selected (no model), ie every row 's prediction is same ,eg. 
> equal event rate (baseline frequency). But at this point,  
> BinaryClassificationEvaluator set the areaUnderPR highest. As a result  best 
> model seleted is a no model. 
> the reason is following.  at time of no model, precision recall curve will be 
> only two points: at recall =0, precision should be set to  zero , while the 
> software set it to 1. at recall=1, precision is the event rate. As a result, 
> the areaUnderPR will be close 0.5 (my even rate is very low), which is 
> maximum .
> the solution is to set precision =0 when recall =0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24431) wrong areaUnderPR calculation in BinaryClassificationEvaluator

Reply via email to