Hi, >From docs http://scikit-learn.org/stable/auto_examples/ensemble/plot_ensemble_oob.html :
The RandomForestClassifier is trained using bootstrap aggregation, where each new tree is fit from a bootstrap sample of the training observations z_i = (x_i, y_i). The out-of-bag (OOB) error is the average error for each z_i calculated using predictions from the trees that do not contain z_i in their respective bootstrap sample. This allows the RandomForestClassifier to be fit and validated whilst being trained [1]. If you get samples with replacements, then you have a high chance for some of z_i not to be included in the training phase of a tree. Then this tree will be involved in estimation of OOB error for z_i. I hope it makes a little bit clearer. 2016-10-03 19:25 GMT+01:00 Ibrahim Dalal via scikit-learn < scikit-learn@python.org>: > Dear Developers, > > From whatever little knowledge I gained last night about Random Forests, > each tree is trained with a sub-sample of original dataset (usually with > replacement)?. > > (Note: Please do correct me if I am not making any sense.) > > RandomForestClassifier has an option of 'bootstrap'. The API states the > following > > >> The sub-sample size is always the same as the original input sample size >> but the samples are drawn with replacement if bootstrap=True (default). > > > Now, what I am not able to understand is - if entire dataset is used to > train each of the trees, then how does the classifier estimates the OOB > error? None of the entries of the dataset is an oob for any of the trees. > (Pardon me if all this sounds BS) > > Help this mere mortal. > > Thanks > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > -- Yours sincerely, https://www.linkedin.com/in/alexey-dral Alexey A. Dral
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn