[ https://issues.apache.org/jira/browse/SPARK-30210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17257898#comment-17257898 ]
Nicholas Brett Marcott commented on SPARK-30210: ------------------------------------------------ + [~hyukjin.kwon], I have two questions on this one: # Do you see anything wrong with how I tried to reproduce? # How long should we wait for reporter until closing the ticket? > Give more informative error for BinaryClassificationEvaluator when data with > only one label is provided > ------------------------------------------------------------------------------------------------------- > > Key: SPARK-30210 > URL: https://issues.apache.org/jira/browse/SPARK-30210 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 3.1.0 > Environment: Pyspark on Databricks > Reporter: Paul Anzel > Priority: Minor > > Hi all, > When I was trying to do some machine learning work with pyspark I ran into a > confusing error message: > {{# Model and train/test set generated...}} > {{ evaluator = BinaryClassificationEvaluator(labelCol=label, > metricName='areaUnderROC')}} > {{ prediction = model.transform(test_data)}} > {{ auc = evaluator.evaluate(prediction)}} > {{org.apache.spark.SparkException: Job aborted due to stage failure: Task 37 > in stage 21.0 failed 4 times, most recent failure: Lost task 37.3 in stage > 21.0 (TID 2811, 10.139.65.48, executor 16): > java.lang.ArrayIndexOutOfBoundsException}} > After some investigation, I found that the issue was that the data I was > trying to predict on only had one label represented, rather than both > positive and negative labels. Easy enough to fix, but I would like to ask if > we could replace this error with one that explicitly points out the issue. > Would it be acceptable to have a check ahead of time on labels that ensures > all labels are represented? Alternately, can we change the docs for > BinaryClassificationEvaluator to explain what this error means? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org