> From: [EMAIL PROTECTED] > > Hello, > > I'm trying to find out the optimal number of splits (mtry > parameter) for a randomForest classification. The > classification is binary and there are 32 explanatory > variables (mostly factors with each up to 4 levels but also > some numeric variables) and 575 cases. > > I've seen that although there are only 32 explanatory > variables the best classification performance is reached when > choosing mtry=80. How is it possible that more variables can > used than there are in columns the data frame?
It's not. The code for randomForest.default() has: ## Make sure mtry is in reasonable range. mtry <- max(1, min(p, round(mtry))) so it silently sets mtry to number of predictors if it's too large. As an example: > library(randomForest) randomForest 4.5-12 Type rfNews() to see new features/changes/bug fixes. > iris.rf = randomForest(Species ~ ., iris, mtry=10) > iris.rf$mtry [1] 4 I should probably add a warning in such cases... Andy > thanks for your help > + kind regards, > > Arne > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html