[R] question regarding "varImpPlot" results vs. model$importance data on package "RandomForest"

Mike Williamson Tue, 13 Jul 2010 16:47:11 -0700

Hi everyone,

    I have another "Random Forest" package question:


   - my (presumably incorrect) understanding of the varImpPlot is that it
   should plot the "% increase in MSE" and "IncNodePurity" exactly as can be
   found from the "importance" section of the model results.
      - However, the plot does not, in fact, match the "importance" section
      of the random forest model.

    E.g., if you use the example given in the ?randomForest, you will see
the plot showing the highest few "%IncMSE" values around 17 or 18%.  But if
you look at the $importance, it is 9.7, 9.4, 7.7, and 7.3.  Perhaps more
importantly, for the plot, it will show "wt" is highest %MSE, then "disp",
then "cyl", then "hp"; whereas the $importance will show "wt", then "disp",
then "hp", then "cyl".  And the ratios look somewhat different, too.
    Here is the code for that example:

set.seed(4543)
data(mtcars)
mtcars.rf <- randomForest(mpg ~ ., data=mtcars, ntree=1000,
keep.forest=FALSE,
importance=TRUE)
varImpPlot(mtcars.rf)

    I am using version 2.11.1 of 'R' and version 4.5-35 of Random Forest.

    I don't really care or need for the varImpPlot to work just right.  But
I am not sure which is accurate:  the varImpPlot or the $importance
section.  Which should I trust more, especially when they disagree
appreciably?

                                             Thanks!
                                                     Mike



"Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here."
  -- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] question regarding "varImpPlot" results vs. model$importance data on package "RandomForest"

Reply via email to