Re: [R] help with random forest package

Liaw, Andy Mon, 13 Apr 2009 07:48:39 -0700

I really don't understand what you don't understand.  Do you know how a
tree forms a prediction?  If not, it may be a good idea to learn about
that first.  The code runs prediction of each case through all trees in
the forest and that's how the votes are formed.  
 
[For OOB predictions, only predictions from trees for which the case is
out-of-bag are counted.  That's why you may get odd-ball vote fractions
even when you grow 100 trees and expect the votes to be in seq(0, 1,
by=0.01).]
 
100% - 2.34% = 97.66%, not 76.6% (I can only assume you had a typo).
 
Cheers,
Andy



________________________________

        From: Chrysanthi A. [mailto:chrys...@gmail.com] 
        Sent: Monday, April 13, 2009 9:44 AM
        To: Liaw, Andy
        Cc: r-help@r-project.org
        Subject: Re: [R] help with random forest package
        
        

        But how does it estimate that voting output? How does it get the
85.7% for all the trees? 

        Regarding the prediction accuracy. If I have OOB error = 2.34,
then the prediction accuracy will be equal to 76.6%, right? 

        Many thanks,

        Chrysanthi.


        2009/4/13 Liaw, Andy <andy_l...@merck.com>
        

                RF forms prediction by voting.  Note that each row in
the output sums to 1.  It says 85.7% of the trees classified the first
case as "healthy" and the other 14.3% of the trees "unhealthy".  The
majority (in two-class cases like this one) wins, so the prediction is
"healthy".
                 
                You can take 1 - OOB error rate as the estimate of
prediction accuracy (if you have not selected variables, e.g., using
variable importance, in building the final RF model).
                 
                Andy


________________________________

                        
                        From: Chrysanthi A. [mailto:chrys...@gmail.com] 
                        
                        Sent: Friday, April 10, 2009 10:44 AM 

                        To: Liaw, Andy
                        Cc: r-help@r-project.org
                        Subject: Re: [R] help with random forest package
                        



                        Hi,
                        
                        To be honest, I cannot really understand what is
the meaning of the votes.. For example having five samples and two
classes what the numbers below means?
                              healthy  unhealthy
                        1  0.85714286 0.14285714
                        2  0.92857143 0.07142857
                        3  0.90000000 0.10000000
                        4  0.92857143 0.07142857
                        5  0.84615385 0.15384615
                        
                        Suppose now, having the classification, I have
an unknown sample and according to the results that Ive got, how can I
predict in which class it belongs to? Do the votes give that prediction
to us? 
                        
                        Also,  the error is reported on the "OOB
estimate of  error rate", right? For example, if we have OOB estimate of
error rate:2.34%, we can say that the prediction accuracy is approx.
97.7%? How can we estimate the prediction accuracy? 


                        Thanks a lot,
                        
                        Chrysanthi.
                        
                        
                        
                        2009/4/8 Liaw, Andy <andy_l...@merck.com>
                        

                                I'm not quite sure what you're asking.
RF predicts by classifying the new observation using all trees in the
forest, and take plural vote.  The predict() method for randomForest
objects does that for you.  The getTree() function shows you what each
individual tree is like (not visually, just the underlying
representation of the tree).
                                 
                                Andy


________________________________

                                From: Chrysanthi A.
[mailto:chrys...@gmail.com] 
                                Sent: Wednesday, April 08, 2009 2:56 PM
                                To: Liaw, Andy
                                Cc: r-help@r-project.org
                                Subject: Re: [R] help with random forest
package
                                
                                
                                Many thanks for the reply.
                                
                                So, extracting the votes, how can we
clarify the classification result? If I want to predict in which class
will be included an unknown sample, what is the rule that will give me
that?
                                
                                Thanks a lot,
                                
                                Chrysanthi.
                                
                                
                                
                                
                                2009/4/8 Liaw, Andy
<andy_l...@merck.com>
                                

                                The source code of the whole package is
available on CRAN.  All packages
                                are submitted to CRAN is source form.
                                
                                There's no "rule" per se that gives the
final prediction, as the final
                                prediction is the result of plural vote
by all trees in the forest.
                                
                                You may want to look at the varUsed()
and getTree() functions.
                                
                                Andy
                                
                                From:  Chrysanthi A.
                                
                                > Hello,
                                >
                                > I am a phd student in Bioinformatics
and I am using the Random Forest
                                > package in order to classify my data,
but I have some questions.
                                > Is there a function in order to
visualize the trees, so as to
                                > get the rules?
                                > Also, could you please provide me with
the code of
                                > "randomForest" function,
                                > as I would like to see how it works. I
was wondering if I can get the
                                > classification having the most votes
over all the trees in
                                > the forest (the
                                > final rules that will give me the
final classification).
                                > Also, is there a
                                > possibility to get a vector with the
attributes that are
                                > being selected for
                                > each node during the construction of
each tree? I mean, that
                                > I would like to
                                > know the m<<M variables that are
selected at each node out of
                                > the M input
                                > attributes.. Are they selected
randomly? Is there a
                                > possibility to select
                                > the same variable in subsequent nodes?
                                >
                                > Thanks a lot,
                                >
                                > Chrysanthi.
                                >
                                
                                >       [[alternative HTML version
deleted]]
                                >
                                >
______________________________________________
                                > R-help@r-project.org mailing list
                                >
https://stat.ethz.ch/mailman/listinfo/r-help
                                > PLEASE do read the posting guide
                                >
http://www.R-project.org/posting-guide.html
                                > and provide commented, minimal,
self-contained, reproducible code.
                                >
                                Notice:  This e-mail message, together
with any attachments, contains
                                information of Merck & Co., Inc. (One
Merck Drive, Whitehouse Station,
                                New Jersey, USA 08889), and/or its
affiliates (which may be known
                                outside the United States as Merck
Frosst, Merck Sharp & Dohme or
                                MSD and in Japan, as Banyu - direct
contact information for affiliates is
                                available at
http://www.merck.com/contact/contacts.html) that may be
                                confidential, proprietary copyrighted
and/or legally privileged. It is
                                intended solely for the use of the
individual or entity named on this
                                message. If you are not the intended
recipient, and have received this
                                message in error, please notify us
immediately by reply e-mail and
                                then delete it from your system.
                                
                                


                                Notice:  This e-mail message, together
with any attachments, contains
                                information of Merck & Co., Inc. (One
Merck Drive, Whitehouse Station,
                                New Jersey, USA 08889), and/or its
affiliates (which may be known
                                outside the United States as Merck
Frosst, Merck Sharp & Dohme or
                                MSD and in Japan, as Banyu - direct
contact information for affiliates is
                                available at
http://www.merck.com/contact/contacts.html) that may be
                                confidential, proprietary copyrighted
and/or legally privileged. It is
                                intended solely for the use of the
individual or entity named on this
                                message. If you are not the intended
recipient, and have received this
                                message in error, please notify us
immediately by reply e-mail and
                                then delete it from your system.



                Notice:  This e-mail message, together with any
attachments, contains
                information of Merck & Co., Inc. (One Merck Drive,
Whitehouse Station,
                New Jersey, USA 08889), and/or its affiliates (which may
be known
                outside the United States as Merck Frosst, Merck Sharp &
Dohme or
                MSD and in Japan, as Banyu - direct contact information
for affiliates is
                available at http://www.merck.com/contact/contacts.html)
that may be
                confidential, proprietary copyrighted and/or legally
privileged. It is
                intended solely for the use of the individual or entity
named on this
                message. If you are not the intended recipient, and have
received this
                message in error, please notify us immediately by reply
e-mail and
                then delete it from your system.


Notice:  This e-mail message, together with any attachme...{{dropped:15}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with random forest package

Reply via email to