[R] Random Forests Variable Importance Question

Paul Fisch Mon, 13 Apr 2009 02:06:11 -0700

I am trying to use the random forests package for classification in R.

The Variable Importance Measures listed are:


-mean raw importance score of variable x for class 0

-mean raw importance score of variable x for class 1

-MeanDecreaseAccuracy

-MeanDecreaseGini

Now I know what these "mean" as in I know their definitions. What I
want to know is how to use them.

What I am trying to figure out is what these values mean in only the
context of how accurate they are, what is a good value, what is a bad
value, what are the maximums and minimums, etc.

If a variable has a high MeanDecreaseAccuracy or MeanDecreaseGini does
that mean it is important or unimportant? Also any information on the
raw scores would be really helpful too. I want to know everything
there is to know about these numbers that is relevant to the
application of them.

I don't really want a technical explanation that uses words like
'error', 'summation', or 'permutated', but rather a simpler
explanation that didn't involve any discussion of how random forests
works(I have read all about that and didn't find it very helpful.)

Like if I wanted someone to explain to me how to use a radio, I
wouldn't expect the explanation to involve how a radio converts radio
waves into sound.

If anyone can help me out at all it would be really great.  I have
read many many lectures on random forests and other data mining
lectures but I have never found simple answers about how to read the
variable importance measures.

Thanks,
Paul Fisch

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Random Forests Variable Importance Question

Reply via email to