Re: [R] help with random forest package

Chrysanthi A. Tue, 28 Apr 2009 07:27:27 -0700

That was very helpful. Using the predict.all option I got exactly what I
need. Is there any way of visualizing the predictions? using MDS plot is the
best way?


Also, I want to run random Forest using three different schemes (e.g.
training set 70% and test se 30%, cross validation with k=10, etc). The best
way to do that is using caret package, or has RF any option for that?

Many thanks,

Chrysanthi.





2009/4/28 Liaw, Andy <andy_l...@merck.com>

>  Let's try an example:
>
> R> iris.1tree <- randomForest(Species ~ ., data=iris, ntree=1)
> R> getTree(iris.1tree, 1)
>   left daughter right daughter split var split point status prediction
> 1             2              3         4        0.80      1          0
> 2             0              0         0        0.00     -1          1
> 3             4              5         4        1.75      1          0
> 4             0              0         0        0.00     -1          2
> 5             6              7         3        4.85      1          0
> 6             8              9         1        6.05      1          0
> 7             0              0         0        0.00     -1          3
> 8             0              0         0        0.00     -1          2
> 9             0              0         0        0.00     -1          3
> R> iris[1,]
>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
> 1          5.1         3.5          1.4         0.2  setosa
> R> predict(iris.1tree, iris[1,], type="prob")
>   setosa versicolor virginica
> 1      1          0         0
> R> levels(iris$Species)
> [1] "setosa"     "versicolor" "virginica"
> The getTree() function showed the first (and only) tree.  To predict the
> first row of iris, we read the tree in the following way.  In the first row
> (the root node), the variable to split is the 4th, or "Petal.Width".  The
> splitting point is 0.8, so data points with Petal.Width < 0.8 go to left and
> others go to right.  Since the "left daughter" is "2", we look at the second
> row of the tree, and it is a leaf (i.e., a terminal node) since the status
> is -1.  The prediction is "1", or the first level of the factor---
> "setosa".  I don't expect anyone to predict data "manually" like this.
> predict.randomForest() does all this for you.
>
> As to individual tree predictions, predict.randomForest() has an option
> "predict.all" that you can use.  To get the OOB votes, though, you will also
> need to look at the output of randomForest(..., inbag=TRUE) to see which
> data point is OOB for which tree.
>
> I hope that's clear now.
>
> Cheers,
> Andy
>
>
>  ------------------------------
> *From:* Chrysanthi A. [mailto:chrys...@gmail.com]
> *Sent:* Tuesday, April 28, 2009 8:52 AM
>
> *To:* Liaw, Andy
> *Cc:* r-help@r-project.org
> *Subject:* Re: [R] help with random forest package
>
>
> Many thanks for your help. Sorry for my delayed reply, but I was away.
> Regarding the OOB error, sorry it was a typo.
>
> As far as the voting, I was just wondering if there is a function that will
> give me the prediction of each case through each tree. Is there any function
> that produce the rules for each tree? If I have a new case that I want to
> predict the class that it belongs to, how can I predict that? I should look
> to each tree and then get the voting? Or are there some predictive rules
> that I can use? I cannot do that prediction from the results that function
> votes give to me...
>
> Also, I was wondering why randomizations along with combining the
> predictions from the trees significantly improve the overall predictive
> accuracy?
>
> Thanks a lot,
>
> Chrysanthi
>
>
>
>
> 2009/4/13 Liaw, Andy <andy_l...@merck.com>
>
>>  I really don't understand what you don't understand.  Do you know how a
>> tree forms a prediction?  If not, it may be a good idea to learn about that
>> first.  The code runs prediction of each case through all trees in the
>> forest and that's how the votes are formed.
>>
>> [For OOB predictions, only predictions from trees for which the case is
>> out-of-bag are counted.  That's why you may get odd-ball vote fractions even
>> when you grow 100 trees and expect the votes to be in seq(0, 1, by=0.01).]
>>
>> 100% - 2.34% = 97.66%, not 76.6% (I can only assume you had a typo).
>>
>> Cheers,
>> Andy
>>
>>  ------------------------------
>>  *From:* Chrysanthi A. [mailto:chrys...@gmail.com]
>> *Sent:* Monday, April 13, 2009 9:44 AM
>>
>> *To:* Liaw, Andy
>> *Cc:* r-help@r-project.org
>> *Subject:* Re: [R] help with random forest package
>>
>>
>> But how does it estimate that voting output? How does it get the 85.7% for
>> all the trees?
>>
>> Regarding the prediction accuracy. If I have OOB error = 2.34, then the
>> prediction accuracy will be equal to 76.6%, right?
>>
>> Many thanks,
>>
>> Chrysanthi.
>>
>>
>> 2009/4/13 Liaw, Andy <andy_l...@merck.com>
>>
>>>  RF forms prediction by voting.  Note that each row in the output sums
>>> to 1.  It says 85.7% of the trees classified the first case as "healthy" and
>>> the other 14.3% of the trees "unhealthy".  The majority (in two-class cases
>>> like this one) wins, so the prediction is "healthy".
>>>
>>> You can take 1 - OOB error rate as the estimate of prediction accuracy
>>> (if you have not selected variables, e.g., using variable importance, in
>>> building the final RF model).
>>>
>>> Andy
>>>
>>>  ------------------------------
>>>  *From:* Chrysanthi A. [mailto:chrys...@gmail.com]
>>> *Sent:* Friday, April 10, 2009 10:44 AM
>>>
>>> *To:* Liaw, Andy
>>> *Cc:* r-help@r-project.org
>>> *Subject:* Re: [R] help with random forest package
>>>
>>>
>>>
>>> Hi,
>>>
>>> To be honest, I cannot really understand what is the meaning of the
>>> votes.. For example having five samples and two classes what the numbers
>>> below means?
>>>       healthy  unhealthy
>>> 1  0.85714286 0.14285714
>>> 2  0.92857143 0.07142857
>>> 3  0.90000000 0.10000000
>>> 4  0.92857143 0.07142857
>>> 5  0.84615385 0.15384615
>>>
>>> Suppose now, having the classification, I have an unknown sample and
>>> according to the results that Ive got, how can I predict in which class it
>>> belongs to? Do the votes give that prediction to us?
>>>
>>> Also,  the error is reported on the "OOB estimate of  error rate", right?
>>> For example, if we have OOB estimate of  error rate:2.34%, we can say that
>>> the prediction accuracy is approx. 97.7%? How can we estimate the prediction
>>> accuracy?
>>>
>>>
>>> Thanks a lot,
>>>
>>> Chrysanthi.
>>>
>>>
>>> 2009/4/8 Liaw, Andy <andy_l...@merck.com>
>>>
>>>>  I'm not quite sure what you're asking.  RF predicts by classifying the
>>>> new observation using all trees in the forest, and take plural vote.  The
>>>> predict() method for randomForest objects does that for you.  The getTree()
>>>> function shows you what each individual tree is like (not visually, just 
>>>> the
>>>> underlying representation of the tree).
>>>>
>>>> Andy
>>>>
>>>>  ------------------------------
>>>> *From:* Chrysanthi A. [mailto:chrys...@gmail.com]
>>>> *Sent:* Wednesday, April 08, 2009 2:56 PM
>>>> *To:* Liaw, Andy
>>>> *Cc:* r-help@r-project.org
>>>> *Subject:* Re: [R] help with random forest package
>>>>
>>>>   Many thanks for the reply.
>>>>
>>>> So, extracting the votes, how can we clarify the classification result?
>>>> If I want to predict in which class will be included an unknown sample, 
>>>> what
>>>> is the rule that will give me that?
>>>>
>>>> Thanks a lot,
>>>>
>>>> Chrysanthi.
>>>>
>>>>
>>>>
>>>> 2009/4/8 Liaw, Andy <andy_l...@merck.com>
>>>>
>>>>> The source code of the whole package is available on CRAN.  All
>>>>> packages
>>>>> are submitted to CRAN is source form.
>>>>>
>>>>> There's no "rule" per se that gives the final prediction, as the final
>>>>> prediction is the result of plural vote by all trees in the forest.
>>>>>
>>>>> You may want to look at the varUsed() and getTree() functions.
>>>>>
>>>>> Andy
>>>>>
>>>>> From:  Chrysanthi A.
>>>>>  > Hello,
>>>>> >
>>>>> > I am a phd student in Bioinformatics and I am using the Random Forest
>>>>> > package in order to classify my data, but I have some questions.
>>>>> > Is there a function in order to visualize the trees, so as to
>>>>> > get the rules?
>>>>> > Also, could you please provide me with the code of
>>>>> > "randomForest" function,
>>>>> > as I would like to see how it works. I was wondering if I can get the
>>>>> > classification having the most votes over all the trees in
>>>>> > the forest (the
>>>>> > final rules that will give me the final classification).
>>>>> > Also, is there a
>>>>> > possibility to get a vector with the attributes that are
>>>>> > being selected for
>>>>> > each node during the construction of each tree? I mean, that
>>>>> > I would like to
>>>>> > know the m<<M variables that are selected at each node out of
>>>>> > the M input
>>>>> > attributes.. Are they selected randomly? Is there a
>>>>> > possibility to select
>>>>> > the same variable in subsequent nodes?
>>>>> >
>>>>> > Thanks a lot,
>>>>> >
>>>>> > Chrysanthi.
>>>>> >
>>>>> >       [[alternative HTML version deleted]]
>>>>> >
>>>>> > ______________________________________________
>>>>> > R-help@r-project.org mailing list
>>>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> > PLEASE do read the posting guide
>>>>> > http://www.R-project.org/posting-guide.html
>>>>> > and provide commented, minimal, self-contained, reproducible code.
>>>>> >
>>>>> Notice:  This e-mail message, together with any attachments, contains
>>>>> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
>>>>> New Jersey, USA 08889), and/or its affiliates (which may be known
>>>>> outside the United States as Merck Frosst, Merck Sharp & Dohme or
>>>>> MSD and in Japan, as Banyu - direct contact information for affiliates
>>>>> is
>>>>> available at http://www.merck.com/contact/contacts.html) that may be
>>>>> confidential, proprietary copyrighted and/or legally privileged. It is
>>>>> intended solely for the use of the individual or entity named on this
>>>>> message. If you are not the intended recipient, and have received this
>>>>> message in error, please notify us immediately by reply e-mail and
>>>>> then delete it from your system.
>>>>>
>>>>>
>>>>  Notice:  This e-mail message, together with any attachments, contains
>>>> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
>>>> New Jersey, USA 08889), and/or its affiliates (which may be known
>>>> outside the United States as Merck Frosst, Merck Sharp & Dohme or
>>>> MSD and in Japan, as Banyu - direct contact information for affiliates is
>>>> available at http://www.merck.com/contact/contacts.html) that may be
>>>> confidential, proprietary copyrighted and/or legally privileged. It is
>>>> intended solely for the use of the individual or entity named on this
>>>> message. If you are not the intended recipient, and have received this
>>>> message in error, please notify us immediately by reply e-mail and
>>>> then delete it from your system.
>>>>
>>>>
>>>
>>>  Notice:  This e-mail message, together with any attachments, contains
>>> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
>>> New Jersey, USA 08889), and/or its affiliates (which may be known
>>> outside the United States as Merck Frosst, Merck Sharp & Dohme or
>>> MSD and in Japan, as Banyu - direct contact information for affiliates is
>>> available at http://www.merck.com/contact/contacts.html) that may be
>>> confidential, proprietary copyrighted and/or legally privileged. It is
>>> intended solely for the use of the individual or entity named on this
>>> message. If you are not the intended recipient, and have received this
>>> message in error, please notify us immediately by reply e-mail and
>>> then delete it from your system.
>>>
>>>
>>  Notice:  This e-mail message, together with any attachments, contains
>> information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station,
>> New Jersey, USA 08889), and/or its affiliates (which may be known
>> outside the United States as Merck Frosst, Merck Sharp & Dohme or
>> MSD and in Japan, as Banyu - direct contact information for affiliates is
>> available at http://www.merck.com/contact/contacts.html) that may be
>> confidential, proprietary copyrighted and/or legally privileged. It is
>> intended solely for the use of the individual or entity named on this
>> message. If you are not the intended recipient, and have received this
>> message in error, please notify us immediately by reply e-mail and
>> then delete it from your system.
>>
>>
> Notice:  This e-mail message, together with any attach...{{dropped:17}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with random forest package

Reply via email to