Hello,

I have written an user-defined split function for the package rpart, now I
want to prune the fitted tree with my own defined function.
To do so I want at first to grow a large tree with rpart and then use the
function to prune the tree.
The problem here is, growing a large tree with the user defined split
function and therefore setting the complexity parameter to 0 (cp=0), gives
me a smaller tree, as when I set the complexity parameter to 0.01 (default
value).
My question is now, which node values does the rpart use in order to
'prune' the tree? I first thought it is only the 'deviance' value, which is
an output of the evaluation function, but I am not quite sure about that
anymore.

Example Output of 2 trees (same data and split functions, different cp) :
load('alist.R') # user defined split function

> fit1 <- rpart(time.discrete ~
x1+x2+x3+x4+x5+x6,datTrain,control=list(cp=0.01),
+ method=alist)

n= 3042

node), split, n, deviance, yval
      * denotes terminal node
1) root 3042 3043.5170 2
   2) x3=2,3 1036 1231.5710 1
     4) x2=2,3,4,5 556  704.0214 1
       8) x6< 23.5 118  126.3924 1 *
       9) x6>=23.5 438  541.8522 1
        18) x1=1 164  214.2196 1 *
        19) x1=0 274  295.5250 2
          38) x5< 30.5 81  102.2434 1 *
          39) x5>=30.5 193  161.5116 3 *
     5) x2=0,1 480  454.6036 3 *
   3) x3=0,1 2006 1698.9710 3
     6) x6< 23.5 596  660.9713 2 *
     7) x6>=23.5 1410  978.9448 3
      14) x5< 19.5 323  342.9626 2 *
      15) x5>=19.5 1087  567.2176 4
        30) x1=0 633  200.8669 5 *
        31) x1=1 454  329.6705 3
          62) x2=0,1,3 254  109.3010 4 *
          63) x2=2,4,5 200  186.8194 3 *

> fit1$cptable
              CP nsplit rel error
1 0.03712005      0 1.0000000
2 0.02396757      1 0.9628800
3 0.02099862      2 0.9389124
4 0.01205192      4 0.8969151
5 0.01175506      5 0.8848632
6 0.01102346      6 0.8731082
7 0.01054950      7 0.8620847
8 0.01043856      8 0.8515352
9 0.01000000      9 0.8410966


> fit2 <- rpart(time.discrete ~
x1+x2+x3+x4+x5+x6,datTrain,control=list(cp=0),
+ method=alist)

n= 3042

node), split, n, deviance, yval
      * denotes terminal node
1) root 3042 3.043517e+03 2
   2) x3=2,3 1036 1.231571e+03 1
     4) x2=2,3,4,5 556 7.040214e+02 1
       8) x6< 23.5 118 1.263924e+02 1
        16) x5< 42.5 73 5.778729e+01 1
          32) x1=1 31 4.888716e-10 1 *
          33) x1=0 42 4.611536e+01 1 *
        17) x5>=42.5 45 5.607449e+01 1 *
       9) x6>=23.5 438 5.418522e+02 1 *
     5) x2=0,1 480 4.546036e+02 3 *
   3) x3=0,1 2006 1.698971e+03 3 *

> fit2$cptable
                CP nsplit rel error
1 0.037120046      0 1.0000000
2 0.023967574      1 0.9628800
3 0.011755057      2 0.9389124
4 0.004117156      3 0.9271573
5 0.003835016      4 0.9230402
6 0.000000000      5 0.9192052


Thank you
Peter Mayer

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to