>> Andy, I'll explain why I am asking. I probably should have
>> done it in the beginning:
>> I am asking not in order to figure out how to do it. I am
>> asking in order to figure something that' was done around
>> November 01, 2008.
>> Back then, a piece of code was run where from the object of
>> randomForest(.... importance=T...) the importances
>> ($importance) were extracted (just by referring to
>> $importance) and the first column was used.
>> Do you happen to know what they were back then? Standardized or not?
>
> The change coincided with the introduction of the importanceSD component, due
> to the change in how the importance is measured. The "importance" component
> are just mean(d[i]), and importanceSD are sd(d[i])/sqrt(ntree). The
> importance() function by default (scale=TRUE) does the normalization, and
> that's what you should use. Leo found that this normalization will greatly
> reduce the "bias" due to different number of possible splits in different
> predictors.
Actually, it looks like if one extracts incorrectly (by looking just
at $importance) - then one gets unscaled results. Hope it was the same
in 2008.
I've just run an example randomForest for a case with 6 predictors
(importance = T). My randomForest object is "rftrest."
Below are some results:
Looking at importances the way it was done in November 2008:
as.data.frame(rftest$importance)[1]
I am getting:
%IncMSE
v1 1.3900833
v2 1.2219338
v3 0.6337521
v4 1.4101760
v5 1.4474130
v6 0.7583074
Extracting as you recommended one should - looking for unscaled
results: importance(rftest, scale=F)
I am getting exactly the same results as above:
%IncMSE IncNodePurity
v1 1.3900833 147.31267
v2 1.2219338 147.51669
v3 0.6337521 97.11210
v4 1.4101760 149.48934
v5 1.4474130 149.61458
v6 0.7583074 97.74933
Now, I am extracting scaled importances: importance(rftest, scale=T)
I am getting:
%IncMSE IncNodePurity
v1 16.97155 147.31267
v2 17.04288 147.51669
v3 10.19135 97.11210
v4 18.22732 149.48934
v5 18.36879 149.61458
v6 10.46555 97.74933
This is the same as what I get when I do this the way it was done in
2008: as.data.frame(rftest$importance)[1]/as.data.frame(rftest$importanceSD)
Resulting in:
%IncMSE
v1 16.97155
v2 17.04288
v3 10.19135
v4 18.22732
v5 18.36879
v6 10.46555
Dimitri
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.