> From whatever little knowledge I gained last night about Random Forests, each > tree is trained with a sub-sample of original dataset (usually with > replacement)?.
Yes, that should be correct! > Now, what I am not able to understand is - if entire dataset is used to train > each of the trees, then how does the classifier estimates the OOB error? None > of the entries of the dataset is an oob for any of the trees. (Pardon me if > all this sounds BS) If you take an n-size bootstrap sample, where n is the number of samples in your dataset, you have asymptotically 0.632 * n unique samples in your bootstrap set. Or in other words 0.368 * n samples are not used for growing the respective tree (to compute the OOB). As far as I understand, the random forest OOB score is then computed as the average OOB of each tee (correct me if I am wrong!). Best, Sebastian > On Oct 3, 2016, at 2:25 PM, Ibrahim Dalal via scikit-learn > <scikit-learn@python.org> wrote: > > Dear Developers, > > From whatever little knowledge I gained last night about Random Forests, each > tree is trained with a sub-sample of original dataset (usually with > replacement)?. > > (Note: Please do correct me if I am not making any sense.) > > RandomForestClassifier has an option of 'bootstrap'. The API states the > following > > The sub-sample size is always the same as the original input sample size but > the samples are drawn with replacement if bootstrap=True (default). > > Now, what I am not able to understand is - if entire dataset is used to train > each of the trees, then how does the classifier estimates the OOB error? None > of the entries of the dataset is an oob for any of the trees. (Pardon me if > all this sounds BS) > > Help this mere mortal. > > Thanks > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn