Can you please suggest how I can use BinaryClassificationEvaluator? I tried:
scala> import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator scala> val evaluator = new BinaryClassificationEvaluator() evaluator: org.apache.spark.ml.evaluation.BinaryClassificationEvaluator = binEval_0d57372b7579 Try 1: scala> evaluator.evaluate(testScoreAndLabel.rdd) <console>:105: error: type mismatch; found : org.apache.spark.rdd.RDD[(Double, Double)] required: org.apache.spark.sql.Dataset[_] evaluator.evaluate(testScoreAndLabel.rdd) Try 2: scala> evaluator.evaluate(testScoreAndLabel) java.lang.IllegalArgumentException: Field "rawPrediction" does not exist. at org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:228) Try 3: scala> evaluator.evaluate(testScoreAndLabel.select("Label","ModelProbability")) org.apache.spark.sql.AnalysisException: cannot resolve '`Label`' given input columns: [_1, _2]; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) On Mon, Nov 14, 2016 at 1:44 PM, Nick Pentreath <nick.pentre...@gmail.com> wrote: > DataFrame.rdd returns an RDD[Row]. You'll need to use map to extract the > doubles from the test score and label DF. > > But you may prefer to just use spark.ml evaluators, which work with > DataFrames. Try BinaryClassificationEvaluator. > > On Mon, 14 Nov 2016 at 19:30, Bhaarat Sharma <bhaara...@gmail.com> wrote: > >> I am getting scala.MatchError in the code below. I'm not able to see why >> this would be happening. I am using Spark 2.0.1 >> >> scala> testResults.columns >> res538: Array[String] = Array(TopicVector, subject_id, hadm_id, isElective, >> isNewborn, isUrgent, isEmergency, isMale, isFemale, oasis_score, >> sapsii_score, sofa_score, age, hosp_death, test, ModelFeatures, Label, >> rawPrediction, ModelProbability, ModelPrediction) >> >> scala> testResults.select("Label","ModelProbability").take(1) >> res542: Array[org.apache.spark.sql.Row] = >> Array([0.0,[0.737304818744076,0.262695181255924]]) >> >> scala> val testScoreAndLabel = testResults. >> | select("Label","ModelProbability"). >> | map { case Row(l:Double, p:Vector) => (p(1), l) } >> testScoreAndLabel: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: >> double, _2: double] >> >> scala> testScoreAndLabel >> res539: org.apache.spark.sql.Dataset[(Double, Double)] = [_1: double, _2: >> double] >> >> scala> testScoreAndLabel.columns >> res540: Array[String] = Array(_1, _2) >> >> scala> val testMetrics = new >> BinaryClassificationMetrics(testScoreAndLabel.rdd) >> testMetrics: org.apache.spark.mllib.evaluation.BinaryClassificationMetrics = >> org.apache.spark.mllib.evaluation.BinaryClassificationMetrics@36e780d1 >> >> The code below gives the error >> >> val auROC = testMetrics.areaUnderROC() //this line gives the error >> >> Caused by: scala.MatchError: [0.0,[0.7316583497453766,0.2683416502546234]] >> (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema) >> >>