> From: Uwe Ligges
> 
> [EMAIL PROTECTED] wrote:
> 
> > Hello,
> > 
> > I'm trying to find out the optimal number of splits (mtry parameter)
> > for a randomForest classification. The classification is binary and
> > there are 32 explanatory variables (mostly factors with each up to 4
> > levels but also some numeric variables) and 575 cases.
> > 
> > I've seen that although there are only 32 explanatory variables the
> > best classification performance is reached when choosing 
> mtry=80. How
> > is it possible that more variables can used than there are 
> in columns
> > the data frame?
> 
> If some of the variables are factors, dummy variables are 
> generated and 
> you get a larger number of variables in the later process.

No, unless the OP is using the formula interface with a version of the 
package from two years or so ago.  We got the first formula interface
by copying and modifying the one for svm() in e1071, and forgot the
fact that SVM needs that for dealing with factors, but not trees 
(especially not how the underlying RF code handles them).  This has
been correctly long ago.

Cheers,
Andy


 
> Uwe Ligges
> 
> 
> > thanks for your help + kind regards,
> > 
> > Arne
> > 
> > 
> > 
> > 
> > [[alternative HTML version deleted]]
> > 
> > ______________________________________________ 
> > R-help@stat.math.ethz.ch mailing list 
> > https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
> > posting guide! http://www.R-project.org/posting-guide.html
> 
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to