If you want to get honest estimates of accuracy, you should repeat the feature 
selection within the resampling (not the test set). You will get different 
lists each time, but that's the point. Right now you are not capturing that 
uncertainty which is why the oob and test set results differ so much.

The list you get int the original training set is still the real list. The 
resampling results help you understand how much you might be overfitting the 
*variables*.

Max

On Feb 22, 2011, at 4:39 PM, ronzhao <yzhaoh...@gmail.com> wrote:

> 
> Thanks, Max.
> 
> Yes, I did some feature selections in the training set. Basically, I
> selected the top 1000 SNPs based on OOB error and grow the forest using
> training set, then using the test set to validate the forest grown.
> 
> But if I do the same thing in test set, the top SNPs would be different than
> those in training set. That may be difficult to interpret.
> 
> 
> 
> 
> -- 
> View this message in context: 
> http://r.789695.n4.nabble.com/Random-Forest-Cross-Validation-tp3314777p3320094.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to