[ https://issues.apache.org/jira/browse/SPARK-24431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502794#comment-16502794 ]
Xinyong Tian commented on SPARK-24431: -------------------------------------- I read more about first point of or curve https://classeval.wordpress.com/introduction/introduction-to-the-precision-recall-plot/ In the above example, when setting predicted probability for each row as 0.01, only one point on pr curve is defined, ie recall=1, precision =0.01. according to the website, first point on the problem curve should be a horizontal line from 2nd point (the only point (1,0.01) here), which should be (0,0.01). In this way, the no model 's areaUnderPR=0.01, instead of 0.05. > wrong areaUnderPR calculation in BinaryClassificationEvaluator > --------------------------------------------------------------- > > Key: SPARK-24431 > URL: https://issues.apache.org/jira/browse/SPARK-24431 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.2.0 > Reporter: Xinyong Tian > Priority: Major > > My problem, I am using CrossValidator(estimator=LogisticRegression(...), ..., > evaluator=BinaryClassificationEvaluator(metricName='areaUnderPR')) to > select best model. when the regParam in logistict regression is very high, no > variable will be selected (no model), ie every row 's prediction is same ,eg. > equal event rate (baseline frequency). But at this point, > BinaryClassificationEvaluator set the areaUnderPR highest. As a result best > model seleted is a no model. > the reason is following. at time of no model, precision recall curve will be > only two points: at recall =0, precision should be set to zero , while the > software set it to 1. at recall=1, precision is the event rate. As a result, > the areaUnderPR will be close 0.5 (my even rate is very low), which is > maximum . > the solution is to set precision =0 when recall =0. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org