Can you please suggest how I can use BinaryClassificationEvaluator? I tried:

scala> import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator

scala>  val evaluator = new BinaryClassificationEvaluator()
evaluator: org.apache.spark.ml.evaluation.BinaryClassificationEvaluator =
binEval_0d57372b7579

Try 1:

scala> evaluator.evaluate(testScoreAndLabel.rdd)
<console>:105: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[(Double, Double)]
 required: org.apache.spark.sql.Dataset[_]
       evaluator.evaluate(testScoreAndLabel.rdd)

Try 2:

scala> evaluator.evaluate(testScoreAndLabel)
java.lang.IllegalArgumentException: Field "rawPrediction" does not exist.
  at
org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:228)

Try 3:

scala>
evaluator.evaluate(testScoreAndLabel.select("Label","ModelProbability"))
org.apache.spark.sql.AnalysisException: cannot resolve '`Label`' given
input columns: [_1, _2];
  at
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)


On Mon, Nov 14, 2016 at 1:44 PM, Nick Pentreath <nick.pentre...@gmail.com>
wrote:

> DataFrame.rdd returns an RDD[Row]. You'll need to use map to extract the
> doubles from the test score and label DF.
>
> But you may prefer to just use spark.ml evaluators, which work with
> DataFrames. Try BinaryClassificationEvaluator.
>
> On Mon, 14 Nov 2016 at 19:30, Bhaarat Sharma <bhaara...@gmail.com> wrote:
>
>> I am getting scala.MatchError in the code below. I'm not able to see why
>> this would be happening. I am using Spark 2.0.1
>>
>> scala> testResults.columns
>> res538: Array[String] = Array(TopicVector, subject_id, hadm_id, isElective, 
>> isNewborn, isUrgent, isEmergency, isMale, isFemale, oasis_score, 
>> sapsii_score, sofa_score, age, hosp_death, test, ModelFeatures, Label, 
>> rawPrediction, ModelProbability, ModelPrediction)
>>
>> scala> testResults.select("Label","ModelProbability").take(1)
>> res542: Array[org.apache.spark.sql.Row] = 
>> Array([0.0,[0.737304818744076,0.262695181255924]])
>>
>> scala> val testScoreAndLabel = testResults.
>>      | select("Label","ModelProbability").
>>      | map { case Row(l:Double, p:Vector) => (p(1), l) }
>> testScoreAndLabel: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: 
>> double, _2: double]
>>
>> scala> testScoreAndLabel
>> res539: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: double, _2: 
>> double]
>>
>> scala> testScoreAndLabel.columns
>> res540: Array[String] = Array(_1, _2)
>>
>> scala> val testMetrics = new 
>> BinaryClassificationMetrics(testScoreAndLabel.rdd)
>> testMetrics: org.apache.spark.mllib.evaluation.BinaryClassificationMetrics = 
>> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics@36e780d1
>>
>> The code below gives the error
>>
>> val auROC = testMetrics.areaUnderROC() //this line gives the error
>>
>> Caused by: scala.MatchError: [0.0,[0.7316583497453766,0.2683416502546234]] 
>> (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)
>>
>>

Reply via email to