That was very helpful. Using the predict.all option I got exactly what I need. Is there any way of visualizing the predictions? using MDS plot is the best way?
Also, I want to run random Forest using three different schemes (e.g. training set 70% and test se 30%, cross validation with k=10, etc). The best way to do that is using caret package, or has RF any option for that? Many thanks, Chrysanthi. 2009/4/28 Liaw, Andy <andy_l...@merck.com> > Let's try an example: > > R> iris.1tree <- randomForest(Species ~ ., data=iris, ntree=1) > R> getTree(iris.1tree, 1) > left daughter right daughter split var split point status prediction > 1 2 3 4 0.80 1 0 > 2 0 0 0 0.00 -1 1 > 3 4 5 4 1.75 1 0 > 4 0 0 0 0.00 -1 2 > 5 6 7 3 4.85 1 0 > 6 8 9 1 6.05 1 0 > 7 0 0 0 0.00 -1 3 > 8 0 0 0 0.00 -1 2 > 9 0 0 0 0.00 -1 3 > R> iris[1,] > Sepal.Length Sepal.Width Petal.Length Petal.Width Species > 1 5.1 3.5 1.4 0.2 setosa > R> predict(iris.1tree, iris[1,], type="prob") > setosa versicolor virginica > 1 1 0 0 > R> levels(iris$Species) > [1] "setosa" "versicolor" "virginica" > The getTree() function showed the first (and only) tree. To predict the > first row of iris, we read the tree in the following way. In the first row > (the root node), the variable to split is the 4th, or "Petal.Width". The > splitting point is 0.8, so data points with Petal.Width < 0.8 go to left and > others go to right. Since the "left daughter" is "2", we look at the second > row of the tree, and it is a leaf (i.e., a terminal node) since the status > is -1. The prediction is "1", or the first level of the factor--- > "setosa". I don't expect anyone to predict data "manually" like this. > predict.randomForest() does all this for you. > > As to individual tree predictions, predict.randomForest() has an option > "predict.all" that you can use. To get the OOB votes, though, you will also > need to look at the output of randomForest(..., inbag=TRUE) to see which > data point is OOB for which tree. > > I hope that's clear now. > > Cheers, > Andy > > > ------------------------------ > *From:* Chrysanthi A. [mailto:chrys...@gmail.com] > *Sent:* Tuesday, April 28, 2009 8:52 AM > > *To:* Liaw, Andy > *Cc:* r-help@r-project.org > *Subject:* Re: [R] help with random forest package > > > Many thanks for your help. Sorry for my delayed reply, but I was away. > Regarding the OOB error, sorry it was a typo. > > As far as the voting, I was just wondering if there is a function that will > give me the prediction of each case through each tree. Is there any function > that produce the rules for each tree? If I have a new case that I want to > predict the class that it belongs to, how can I predict that? I should look > to each tree and then get the voting? Or are there some predictive rules > that I can use? I cannot do that prediction from the results that function > votes give to me... > > Also, I was wondering why randomizations along with combining the > predictions from the trees significantly improve the overall predictive > accuracy? > > Thanks a lot, > > Chrysanthi > > > > > 2009/4/13 Liaw, Andy <andy_l...@merck.com> > >> I really don't understand what you don't understand. Do you know how a >> tree forms a prediction? If not, it may be a good idea to learn about that >> first. The code runs prediction of each case through all trees in the >> forest and that's how the votes are formed. >> >> [For OOB predictions, only predictions from trees for which the case is >> out-of-bag are counted. That's why you may get odd-ball vote fractions even >> when you grow 100 trees and expect the votes to be in seq(0, 1, by=0.01).] >> >> 100% - 2.34% = 97.66%, not 76.6% (I can only assume you had a typo). >> >> Cheers, >> Andy >> >> ------------------------------ >> *From:* Chrysanthi A. [mailto:chrys...@gmail.com] >> *Sent:* Monday, April 13, 2009 9:44 AM >> >> *To:* Liaw, Andy >> *Cc:* r-help@r-project.org >> *Subject:* Re: [R] help with random forest package >> >> >> But how does it estimate that voting output? How does it get the 85.7% for >> all the trees? >> >> Regarding the prediction accuracy. If I have OOB error = 2.34, then the >> prediction accuracy will be equal to 76.6%, right? >> >> Many thanks, >> >> Chrysanthi. >> >> >> 2009/4/13 Liaw, Andy <andy_l...@merck.com> >> >>> RF forms prediction by voting. Note that each row in the output sums >>> to 1. It says 85.7% of the trees classified the first case as "healthy" and >>> the other 14.3% of the trees "unhealthy". The majority (in two-class cases >>> like this one) wins, so the prediction is "healthy". >>> >>> You can take 1 - OOB error rate as the estimate of prediction accuracy >>> (if you have not selected variables, e.g., using variable importance, in >>> building the final RF model). >>> >>> Andy >>> >>> ------------------------------ >>> *From:* Chrysanthi A. [mailto:chrys...@gmail.com] >>> *Sent:* Friday, April 10, 2009 10:44 AM >>> >>> *To:* Liaw, Andy >>> *Cc:* r-help@r-project.org >>> *Subject:* Re: [R] help with random forest package >>> >>> >>> >>> Hi, >>> >>> To be honest, I cannot really understand what is the meaning of the >>> votes.. For example having five samples and two classes what the numbers >>> below means? >>> healthy unhealthy >>> 1 0.85714286 0.14285714 >>> 2 0.92857143 0.07142857 >>> 3 0.90000000 0.10000000 >>> 4 0.92857143 0.07142857 >>> 5 0.84615385 0.15384615 >>> >>> Suppose now, having the classification, I have an unknown sample and >>> according to the results that Ive got, how can I predict in which class it >>> belongs to? Do the votes give that prediction to us? >>> >>> Also, the error is reported on the "OOB estimate of error rate", right? >>> For example, if we have OOB estimate of error rate:2.34%, we can say that >>> the prediction accuracy is approx. 97.7%? How can we estimate the prediction >>> accuracy? >>> >>> >>> Thanks a lot, >>> >>> Chrysanthi. >>> >>> >>> 2009/4/8 Liaw, Andy <andy_l...@merck.com> >>> >>>> I'm not quite sure what you're asking. RF predicts by classifying the >>>> new observation using all trees in the forest, and take plural vote. The >>>> predict() method for randomForest objects does that for you. The getTree() >>>> function shows you what each individual tree is like (not visually, just >>>> the >>>> underlying representation of the tree). >>>> >>>> Andy >>>> >>>> ------------------------------ >>>> *From:* Chrysanthi A. [mailto:chrys...@gmail.com] >>>> *Sent:* Wednesday, April 08, 2009 2:56 PM >>>> *To:* Liaw, Andy >>>> *Cc:* r-help@r-project.org >>>> *Subject:* Re: [R] help with random forest package >>>> >>>> Many thanks for the reply. >>>> >>>> So, extracting the votes, how can we clarify the classification result? >>>> If I want to predict in which class will be included an unknown sample, >>>> what >>>> is the rule that will give me that? >>>> >>>> Thanks a lot, >>>> >>>> Chrysanthi. >>>> >>>> >>>> >>>> 2009/4/8 Liaw, Andy <andy_l...@merck.com> >>>> >>>>> The source code of the whole package is available on CRAN. All >>>>> packages >>>>> are submitted to CRAN is source form. >>>>> >>>>> There's no "rule" per se that gives the final prediction, as the final >>>>> prediction is the result of plural vote by all trees in the forest. >>>>> >>>>> You may want to look at the varUsed() and getTree() functions. >>>>> >>>>> Andy >>>>> >>>>> From: Chrysanthi A. >>>>> > Hello, >>>>> > >>>>> > I am a phd student in Bioinformatics and I am using the Random Forest >>>>> > package in order to classify my data, but I have some questions. >>>>> > Is there a function in order to visualize the trees, so as to >>>>> > get the rules? >>>>> > Also, could you please provide me with the code of >>>>> > "randomForest" function, >>>>> > as I would like to see how it works. I was wondering if I can get the >>>>> > classification having the most votes over all the trees in >>>>> > the forest (the >>>>> > final rules that will give me the final classification). >>>>> > Also, is there a >>>>> > possibility to get a vector with the attributes that are >>>>> > being selected for >>>>> > each node during the construction of each tree? I mean, that >>>>> > I would like to >>>>> > know the m<<M variables that are selected at each node out of >>>>> > the M input >>>>> > attributes.. Are they selected randomly? Is there a >>>>> > possibility to select >>>>> > the same variable in subsequent nodes? >>>>> > >>>>> > Thanks a lot, >>>>> > >>>>> > Chrysanthi. >>>>> > >>>>> > [[alternative HTML version deleted]] >>>>> > >>>>> > ______________________________________________ >>>>> > R-help@r-project.org mailing list >>>>> > https://stat.ethz.ch/mailman/listinfo/r-help >>>>> > PLEASE do read the posting guide >>>>> > http://www.R-project.org/posting-guide.html >>>>> > and provide commented, minimal, self-contained, reproducible code. >>>>> > >>>>> Notice: This e-mail message, together with any attachments, contains >>>>> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, >>>>> New Jersey, USA 08889), and/or its affiliates (which may be known >>>>> outside the United States as Merck Frosst, Merck Sharp & Dohme or >>>>> MSD and in Japan, as Banyu - direct contact information for affiliates >>>>> is >>>>> available at http://www.merck.com/contact/contacts.html) that may be >>>>> confidential, proprietary copyrighted and/or legally privileged. It is >>>>> intended solely for the use of the individual or entity named on this >>>>> message. If you are not the intended recipient, and have received this >>>>> message in error, please notify us immediately by reply e-mail and >>>>> then delete it from your system. >>>>> >>>>> >>>> Notice: This e-mail message, together with any attachments, contains >>>> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, >>>> New Jersey, USA 08889), and/or its affiliates (which may be known >>>> outside the United States as Merck Frosst, Merck Sharp & Dohme or >>>> MSD and in Japan, as Banyu - direct contact information for affiliates is >>>> available at http://www.merck.com/contact/contacts.html) that may be >>>> confidential, proprietary copyrighted and/or legally privileged. It is >>>> intended solely for the use of the individual or entity named on this >>>> message. If you are not the intended recipient, and have received this >>>> message in error, please notify us immediately by reply e-mail and >>>> then delete it from your system. >>>> >>>> >>> >>> Notice: This e-mail message, together with any attachments, contains >>> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, >>> New Jersey, USA 08889), and/or its affiliates (which may be known >>> outside the United States as Merck Frosst, Merck Sharp & Dohme or >>> MSD and in Japan, as Banyu - direct contact information for affiliates is >>> available at http://www.merck.com/contact/contacts.html) that may be >>> confidential, proprietary copyrighted and/or legally privileged. It is >>> intended solely for the use of the individual or entity named on this >>> message. If you are not the intended recipient, and have received this >>> message in error, please notify us immediately by reply e-mail and >>> then delete it from your system. >>> >>> >> Notice: This e-mail message, together with any attachments, contains >> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, >> New Jersey, USA 08889), and/or its affiliates (which may be known >> outside the United States as Merck Frosst, Merck Sharp & Dohme or >> MSD and in Japan, as Banyu - direct contact information for affiliates is >> available at http://www.merck.com/contact/contacts.html) that may be >> confidential, proprietary copyrighted and/or legally privileged. It is >> intended solely for the use of the individual or entity named on this >> message. If you are not the intended recipient, and have received this >> message in error, please notify us immediately by reply e-mail and >> then delete it from your system. >> >> > Notice: This e-mail message, together with any attach...{{dropped:17}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.