The function rpart may well overfit if the value of the CP statistic is left at its default. Use the functions printcp() and plotcP() to check how the cross-validation estimate of relative error (xerror) changes with the number of splits (NB that the CP that leads to a further split changes monotonically with the number of splits). The rel error column from printcp() can be hopelessly optimistic.
John Maindonald email: john.maindon...@anu.edu.au<mailto:john.maindon...@anu.edu.au> phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 8/04/2014, at 8:00 pm, r-help-requ...@r-project.org<mailto:r-help-requ...@r-project.org> wrote: From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org> [mailto:r-help-boun...@r-project.org] On Behalf Of Schillo, Sonja Sent: Thursday, April 03, 2014 3:58 PM To: Mitchell Maltenfort Cc: r-help@r-project.org<mailto:r-help@r-project.org> Subject: Re: [R] rpart and randomforest results Hi, the random forest should do that, you're totally right. As far as I know it does so by randomly selecting the variables considered for a split (but here we set the option for how many variables to consider at each split to the number of variables available so that I thought that the random forest does not have the chance to randomly select the variables). The next thing that randomforest does is bootstrapping. But here again we set the option to the number of cases we have in the data set so that no bootstrapping should be done. We tried to take all the "randomness" from the randomforest away. Is that plausible and does anyone have another idea? Thanks Sonja [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.