Re: Get both feature importance and ROC curve from a random forest classifier
well, sounds trivial now ... ! thanks ;-) 2016-07-02 10:04 GMT+02:00 Yanbo Liang : > Hi Mathieu, > > Using the new ml package to train a RandomForestClassificationModel, you > can get feature importance. Then you can convert the prediction result to > RDD and feed it into BinaryClassificationEvaluator for ROC curve. You can > refer the following code snippet: > > val rf = new RandomForestClassifier() > val model = rf.fit(trainingData) > > val predictions = model.transform(testData) > > val scoreAndLabels = > predictions.select(model.getRawPredictionCol, model.getLabelCol).rdd.map > { > case Row(rawPrediction: Vector, label: Double) => (rawPrediction(1), > label) > case Row(rawPrediction: Double, label: Double) => (rawPrediction, > label) > } > val metrics = new BinaryClassificationMetrics(scoreAndLabels) > metrics.roc() > > > Thanks > Yanbo > > 2016-06-15 7:13 GMT-07:00 matd : > >> Hi ml folks ! >> >> I'm using a Random Forest for a binary classification. >> I'm interested in getting both the ROC *curve* and the feature importance >> from the trained model. >> >> If I'm not missing something obvious, the ROC curve is only available in >> the >> old mllib world, via BinaryClassificationMetrics. In the new ml package, >> only the areaUnderROC and areaUnderPR are available through >> BinaryClassificationEvaluator. >> >> The feature importance is only available in ml package, through >> RandomForestClassificationModel. >> >> Any idea to get both ? >> >> Mathieu >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Get-both-feature-importance-and-ROC-curve-from-a-random-forest-classifier-tp27175.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> >
Re: Get both feature importance and ROC curve from a random forest classifier
Hi Mathieu, Using the new ml package to train a RandomForestClassificationModel, you can get feature importance. Then you can convert the prediction result to RDD and feed it into BinaryClassificationEvaluator for ROC curve. You can refer the following code snippet: val rf = new RandomForestClassifier() val model = rf.fit(trainingData) val predictions = model.transform(testData) val scoreAndLabels = predictions.select(model.getRawPredictionCol, model.getLabelCol).rdd.map { case Row(rawPrediction: Vector, label: Double) => (rawPrediction(1), label) case Row(rawPrediction: Double, label: Double) => (rawPrediction, label) } val metrics = new BinaryClassificationMetrics(scoreAndLabels) metrics.roc() Thanks Yanbo 2016-06-15 7:13 GMT-07:00 matd : > Hi ml folks ! > > I'm using a Random Forest for a binary classification. > I'm interested in getting both the ROC *curve* and the feature importance > from the trained model. > > If I'm not missing something obvious, the ROC curve is only available in > the > old mllib world, via BinaryClassificationMetrics. In the new ml package, > only the areaUnderROC and areaUnderPR are available through > BinaryClassificationEvaluator. > > The feature importance is only available in ml package, through > RandomForestClassificationModel. > > Any idea to get both ? > > Mathieu > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Get-both-feature-importance-and-ROC-curve-from-a-random-forest-classifier-tp27175.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Get both feature importance and ROC curve from a random forest classifier
Hi ml folks ! I'm using a Random Forest for a binary classification. I'm interested in getting both the ROC *curve* and the feature importance from the trained model. If I'm not missing something obvious, the ROC curve is only available in the old mllib world, via BinaryClassificationMetrics. In the new ml package, only the areaUnderROC and areaUnderPR are available through BinaryClassificationEvaluator. The feature importance is only available in ml package, through RandomForestClassificationModel. Any idea to get both ? Mathieu -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Get-both-feature-importance-and-ROC-curve-from-a-random-forest-classifier-tp27175.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org