I really don't understand what you don't understand. Do you know how a tree forms a prediction? If not, it may be a good idea to learn about that first. The code runs prediction of each case through all trees in the forest and that's how the votes are formed. [For OOB predictions, only predictions from trees for which the case is out-of-bag are counted. That's why you may get odd-ball vote fractions even when you grow 100 trees and expect the votes to be in seq(0, 1, by=0.01).] 100% - 2.34% = 97.66%, not 76.6% (I can only assume you had a typo). Cheers, Andy
________________________________ From: Chrysanthi A. [mailto:chrys...@gmail.com] Sent: Monday, April 13, 2009 9:44 AM To: Liaw, Andy Cc: r-help@r-project.org Subject: Re: [R] help with random forest package But how does it estimate that voting output? How does it get the 85.7% for all the trees? Regarding the prediction accuracy. If I have OOB error = 2.34, then the prediction accuracy will be equal to 76.6%, right? Many thanks, Chrysanthi. 2009/4/13 Liaw, Andy <andy_l...@merck.com> RF forms prediction by voting. Note that each row in the output sums to 1. It says 85.7% of the trees classified the first case as "healthy" and the other 14.3% of the trees "unhealthy". The majority (in two-class cases like this one) wins, so the prediction is "healthy". You can take 1 - OOB error rate as the estimate of prediction accuracy (if you have not selected variables, e.g., using variable importance, in building the final RF model). Andy ________________________________ From: Chrysanthi A. [mailto:chrys...@gmail.com] Sent: Friday, April 10, 2009 10:44 AM To: Liaw, Andy Cc: r-help@r-project.org Subject: Re: [R] help with random forest package Hi, To be honest, I cannot really understand what is the meaning of the votes.. For example having five samples and two classes what the numbers below means? healthy unhealthy 1 0.85714286 0.14285714 2 0.92857143 0.07142857 3 0.90000000 0.10000000 4 0.92857143 0.07142857 5 0.84615385 0.15384615 Suppose now, having the classification, I have an unknown sample and according to the results that Ive got, how can I predict in which class it belongs to? Do the votes give that prediction to us? Also, the error is reported on the "OOB estimate of error rate", right? For example, if we have OOB estimate of error rate:2.34%, we can say that the prediction accuracy is approx. 97.7%? How can we estimate the prediction accuracy? Thanks a lot, Chrysanthi. 2009/4/8 Liaw, Andy <andy_l...@merck.com> I'm not quite sure what you're asking. RF predicts by classifying the new observation using all trees in the forest, and take plural vote. The predict() method for randomForest objects does that for you. The getTree() function shows you what each individual tree is like (not visually, just the underlying representation of the tree). Andy ________________________________ From: Chrysanthi A. [mailto:chrys...@gmail.com] Sent: Wednesday, April 08, 2009 2:56 PM To: Liaw, Andy Cc: r-help@r-project.org Subject: Re: [R] help with random forest package Many thanks for the reply. So, extracting the votes, how can we clarify the classification result? If I want to predict in which class will be included an unknown sample, what is the rule that will give me that? Thanks a lot, Chrysanthi. 2009/4/8 Liaw, Andy <andy_l...@merck.com> The source code of the whole package is available on CRAN. All packages are submitted to CRAN is source form. There's no "rule" per se that gives the final prediction, as the final prediction is the result of plural vote by all trees in the forest. You may want to look at the varUsed() and getTree() functions. Andy From: Chrysanthi A. > Hello, > > I am a phd student in Bioinformatics and I am using the Random Forest > package in order to classify my data, but I have some questions. > Is there a function in order to visualize the trees, so as to > get the rules? > Also, could you please provide me with the code of > "randomForest" function, > as I would like to see how it works. I was wondering if I can get the > classification having the most votes over all the trees in > the forest (the > final rules that will give me the final classification). > Also, is there a > possibility to get a vector with the attributes that are > being selected for > each node during the construction of each tree? I mean, that > I would like to > know the m<<M variables that are selected at each node out of > the M input > attributes.. Are they selected randomly? Is there a > possibility to select > the same variable in subsequent nodes? > > Thanks a lot, > > Chrysanthi. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu - direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. Notice: This e-mail message, together with any attachme...{{dropped:15}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.