Re: Get both feature importance and ROC curve from a random forest classifier

2016-07-06 Thread Mathieu D
well, sounds trivial now ... !
thanks ;-)

2016-07-02 10:04 GMT+02:00 Yanbo Liang :

> Hi Mathieu,
>
> Using the new ml package to train a RandomForestClassificationModel, you
> can get feature importance. Then you can convert the prediction result to
> RDD and feed it into BinaryClassificationEvaluator for ROC curve. You can
> refer the following code snippet:
>
> val rf = new RandomForestClassifier()
> val model = rf.fit(trainingData)
>
> val predictions = model.transform(testData)
>
> val scoreAndLabels =
>   predictions.select(model.getRawPredictionCol, model.getLabelCol).rdd.map
> {
> case Row(rawPrediction: Vector, label: Double) => (rawPrediction(1),
> label)
> case Row(rawPrediction: Double, label: Double) => (rawPrediction,
> label)
>   }
> val metrics = new BinaryClassificationMetrics(scoreAndLabels)
> metrics.roc()
>
>
> Thanks
> Yanbo
>
> 2016-06-15 7:13 GMT-07:00 matd :
>
>> Hi ml folks !
>>
>> I'm using a Random Forest for a binary classification.
>> I'm interested in getting both the ROC *curve* and the feature importance
>> from the trained model.
>>
>> If I'm not missing something obvious, the ROC curve is only available in
>> the
>> old mllib world, via BinaryClassificationMetrics. In the new ml package,
>> only the areaUnderROC and areaUnderPR are available through
>> BinaryClassificationEvaluator.
>>
>> The feature importance is only available in ml package, through
>> RandomForestClassificationModel.
>>
>> Any idea to get both ?
>>
>> Mathieu
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Get-both-feature-importance-and-ROC-curve-from-a-random-forest-classifier-tp27175.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Get both feature importance and ROC curve from a random forest classifier

2016-07-02 Thread Yanbo Liang
Hi Mathieu,

Using the new ml package to train a RandomForestClassificationModel, you
can get feature importance. Then you can convert the prediction result to
RDD and feed it into BinaryClassificationEvaluator for ROC curve. You can
refer the following code snippet:

val rf = new RandomForestClassifier()
val model = rf.fit(trainingData)

val predictions = model.transform(testData)

val scoreAndLabels =
  predictions.select(model.getRawPredictionCol, model.getLabelCol).rdd.map {
case Row(rawPrediction: Vector, label: Double) => (rawPrediction(1),
label)
case Row(rawPrediction: Double, label: Double) => (rawPrediction, label)
  }
val metrics = new BinaryClassificationMetrics(scoreAndLabels)
metrics.roc()


Thanks
Yanbo

2016-06-15 7:13 GMT-07:00 matd :

> Hi ml folks !
>
> I'm using a Random Forest for a binary classification.
> I'm interested in getting both the ROC *curve* and the feature importance
> from the trained model.
>
> If I'm not missing something obvious, the ROC curve is only available in
> the
> old mllib world, via BinaryClassificationMetrics. In the new ml package,
> only the areaUnderROC and areaUnderPR are available through
> BinaryClassificationEvaluator.
>
> The feature importance is only available in ml package, through
> RandomForestClassificationModel.
>
> Any idea to get both ?
>
> Mathieu
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Get-both-feature-importance-and-ROC-curve-from-a-random-forest-classifier-tp27175.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Get both feature importance and ROC curve from a random forest classifier

2016-06-15 Thread matd
Hi ml folks !

I'm using a Random Forest for a binary classification.
I'm interested in getting both the ROC *curve* and the feature importance
from the trained model.

If I'm not missing something obvious, the ROC curve is only available in the
old mllib world, via BinaryClassificationMetrics. In the new ml package,
only the areaUnderROC and areaUnderPR are available through
BinaryClassificationEvaluator.

The feature importance is only available in ml package, through
RandomForestClassificationModel.

Any idea to get both ?

Mathieu



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Get-both-feature-importance-and-ROC-curve-from-a-random-forest-classifier-tp27175.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org