> From whatever little knowledge I gained last night about Random Forests, each 
> tree is trained with a sub-sample of original dataset (usually with 
> replacement)?.

Yes, that should be correct!

> Now, what I am not able to understand is - if entire dataset is used to train 
> each of the trees, then how does the classifier estimates the OOB error? None 
> of the entries of the dataset is an oob for any of the trees. (Pardon me if 
> all this sounds BS)

If you take an n-size bootstrap sample, where n is the number of samples in 
your dataset, you have asymptotically 0.632 * n unique samples in your 
bootstrap set. Or in other words 0.368 * n samples are not used for growing the 
respective tree (to compute the OOB). As far as I understand, the random forest 
OOB score is then computed as the average OOB of each tee (correct me if I am 
wrong!).

Best,
Sebastian

> On Oct 3, 2016, at 2:25 PM, Ibrahim Dalal via scikit-learn 
> <scikit-learn@python.org> wrote:
> 
> Dear Developers,
> 
> From whatever little knowledge I gained last night about Random Forests, each 
> tree is trained with a sub-sample of original dataset (usually with 
> replacement)?.
> 
> (Note: Please do correct me if I am not making any sense.)
> 
> RandomForestClassifier has an option of 'bootstrap'. The API states the 
> following
>  
> The sub-sample size is always the same as the original input sample size but 
> the samples are drawn with replacement if bootstrap=True (default).
> 
> Now, what I am not able to understand is - if entire dataset is used to train 
> each of the trees, then how does the classifier estimates the OOB error? None 
> of the entries of the dataset is an oob for any of the trees. (Pardon me if 
> all this sounds BS)
> 
> Help this mere mortal.
> 
> Thanks
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to