Just one small correction: in #3 it should be squared residuals. Yes, the function returns a vector of r^2 with length=ntree, with the k-th element being the r^2 for the forest consisting of the first k trees.
Cheers, Andy From: Dimitri Liakhovitski > > I would like to summarize. Would you please confirm that my summary is > correct? Thank you very much! > > Determining R^2 in Random Forests (for a Regression Forest): > > 1. For each individual case, record a mean prediction on the dependent > variable y across all trees for which the case is OOB (Out-of-Bag); > 2. For each individual case, calculate a residual: residual = observed > y - mean predicted y (from step 1) > 3. Calculate mean square residual MSE: MSE = sum of all individual > residuals (from step 2) / n > 4. Because MSE/var(y) represents the proportion of y variance that is > due to error, then R^2 = 1 - MSE/var(y). > > If it's correct, my last question would be: > I am getting as many R^2 as the number of trees because each time the > residuals are recalculated using all trees built so far, correct? > > Thank you very much! > Dimitri > > > On Mon, Apr 13, 2009 at 6:22 PM, Liaw, Andy > <andy_l...@merck.com> wrote: > > Apologies: that should have been sum(residual^2)! > > > >> -----Original Message----- > >> From: Dimitri Liakhovitski [mailto:ld7...@gmail.com] > >> Sent: Monday, April 13, 2009 4:35 PM > >> To: Liaw, Andy > >> Cc: R-Help List > >> Subject: Re: [R] Random Forests: Question about R^2 > >> > >> Andy, > >> thank you very much! > >> One clarification question: > >> > >> If MSE = sum(residuals) / n, then > >> in the formula (1 - mse / Var(y)) - shouldn't one square mse before > >> dividing by variance? > >> > >> Dimitri > >> > >> > >> On Mon, Apr 13, 2009 at 10:52 AM, Liaw, Andy > >> <andy_l...@merck.com> wrote: > >> > MSE is the mean squared residuals. For the training > data, the OOB > >> > estimate is used (i.e., residual = data - OOB prediction, MSE = > >> > sum(residuals) / n, OOB prediction is the mean of > >> predictions from all > >> > trees for which the case is OOB). It is _not_ the average > >> OOB MSE of > >> > trees in the forest. > >> > > >> > I hope there's no question about how the pseudo R^2 is > computed on a > >> > test set? If you understand how that's done, I assume the > >> confusion is > >> > only how the OOB MSE is formed. > >> > > >> > Best, > >> > Andy > >> > > >> > From: Dimitri Liakhovitski > >> >> > >> >> Dear Random Forests gurus, > >> >> > >> >> I have a question about R^2 provided by randomForest (for > >> regression). > >> >> I don't succeed in finding this information. > >> >> > >> >> In the help file for randomForest under "Value" it says: > >> >> > >> >> rsq: (regression only) - "pseudo R-squared'': 1 - mse / Var(y). > >> >> > >> >> Could someone please explain in somewhat more detail how > >> exactly R^2 > >> >> is calculated? > >> >> Is "mse" mean squared error for prediction? > >> >> Is "mse" an average of mse's for all trees run on out-of-bag > >> >> holdout samples? > >> >> In other words - is this R^2 based on out-of-bag samples? > >> >> > >> >> Thank you very much for clarification! > >> >> > >> >> -- > >> >> Dimitri Liakhovitski > >> >> MarketTools, Inc. > >> >> dimitri.liakhovit...@markettools.com > >> >> > >> >> ______________________________________________ > >> >> R-help@r-project.org mailing list > >> >> https://stat.ethz.ch/mailman/listinfo/r-help > >> >> PLEASE do read the posting guide > >> >> http://www.R-project.org/posting-guide.html > >> >> and provide commented, minimal, self-contained, > reproducible code. > >> >> > >> > Notice: This e-mail message, together with any > >> attachments, contains > >> > information of Merck & Co., Inc. (One Merck Drive, > >> Whitehouse Station, > >> > New Jersey, USA 08889), and/or its affiliates (which may be known > >> > outside the United States as Merck Frosst, Merck Sharp & Dohme or > >> > MSD and in Japan, as Banyu - direct contact information for > >> affiliates is > >> > available at http://www.merck.com/contact/contacts.html) > that may be > >> > confidential, proprietary copyrighted and/or legally > >> privileged. It is > >> > intended solely for the use of the individual or entity > >> named on this > >> > message. If you are not the intended recipient, and have > >> received this > >> > message in error, please notify us immediately by reply > e-mail and > >> > then delete it from your system. > >> > > >> > > >> > >> > >> > >> -- > >> Dimitri Liakhovitski > >> MarketTools, Inc. > >> dimitri.liakhovit...@markettools.com > >> > > Notice: This e-mail message, together with any > attachments, contains > > information of Merck & Co., Inc. (One Merck Drive, > Whitehouse Station, > > New Jersey, USA 08889), and/or its affiliates (which may be known > > outside the United States as Merck Frosst, Merck Sharp & Dohme or > > MSD and in Japan, as Banyu - direct contact information for > affiliates is > > available at http://www.merck.com/contact/contacts.html) that may be > > confidential, proprietary copyrighted and/or legally > privileged. It is > > intended solely for the use of the individual or entity > named on this > > message. If you are not the intended recipient, and have > received this > > message in error, please notify us immediately by reply e-mail and > > then delete it from your system. > > > > > > > > -- > Dimitri Liakhovitski > MarketTools, Inc. > dimitri.liakhovit...@markettools.com > Notice: This e-mail message, together with any attachme...{{dropped:12}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.