[R] bug calculating ROC with caret and earth?

Andrew Ziem Fri, 04 Nov 2011 16:14:04 -0700

Does caret have a bug calculating ROC with earth?  When using caret and earth 
on any of my data sets, caret's ROC never varies.  This could mean earth is 
finding the same model (for example, because of using an nprune parameter that 
is too high).  However, if that were true, sensitivity and specificity would 
also not vary, but they do vary.  Also, I verified nprune is not too high.


I am attaching sample output from R 2.14.0 on Windows 7 64-bit with earth 3.2 
and caret 5.07.

I don't have this problem with caret and ctree.



Andrew

R version 2.14.0 (2011-10-31)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> # install and load packages, as needed
> for (pkg in c('caret','earth','mlbench', 'e1071')) {
+         if (!require(pkg, character.only=T)) {install.packages(pkg)}
+         require(pkg, character.only=T)
+ }
Loading required package: caret
Loading required package: lattice
Loading required package: reshape
Loading required package: plyr

Attaching package: reshape

The following object(s) are masked from package:plyr:

    rename, round_any

Loading required package: cluster
Loading required package: foreach
Loading required package: iterators
Loading required package: codetools
foreach: simple, scalable parallel programming from Revolution Analytics
Use Revolution R for scalability, fault tolerance and more.
http://www.revolutionanalytics.com
Loading required package: earth
Loading required package: leaps
Loading required package: plotmo
Loading required package: plotrix
Loading required package: mlbench
Loading required package: e1071
Loading required package: class

Attaching package: class

The following object(s) are masked from package:reshape:

    condense

> 
> # system information
> installed.packages()[c('earth','caret'),'Version']
     earth      caret 
   "3.2-1" "5.07-001" 
> 
> 
> # prepare data
> data(etitanic)
> mydata <- etitanic
> mydata$survived <- as.factor(ifelse(etitanic$survived==1, 'T', 'F'))
> summary(mydata)
 pclass    survived     sex           age              sibsp            parch   
    
 1st:284   F:619    female:388   Min.   : 0.1667   Min.   :0.0000   Min.   
:0.0000  
 2nd:261   T:427    male  :658   1st Qu.:21.0000   1st Qu.:0.0000   1st 
Qu.:0.0000  
 3rd:501                         Median :28.0000   Median :0.0000   Median 
:0.0000  
                                 Mean   :29.8811   Mean   :0.5029   Mean   
:0.4207  
                                 3rd Qu.:39.0000   3rd Qu.:1.0000   3rd 
Qu.:1.0000  
                                 Max.   :80.0000   Max.   :8.0000   Max.   
:6.0000  
> 
> # show natural maximum pruning is 9
> fit <- earth(survived ~ ., data=mydata)
> summary(fit, style="max")
Call: earth(formula=survived~., data=mydata)

T =
  1.094732
  -   0.2113713 * max(0, pclass2nd -         0) 
  -   0.3413489 * max(0, pclass3rd -         0) 
  -   0.4851343 * max(0,   sexmale -         0) 
  - 0.004222467 * max(0,       age -        10) 
  +  0.02569032 * max(0,        10 -       age) 
  -  0.09699376 * max(0,     sibsp -         1) 
  -  0.06266133 * max(0,     parch -         1) 
  -  0.09015484 * max(0,         1 -     parch) 

Selected 9 of 10 terms, and 6 of 6 predictors 
Importance: sexmale, pclass3rd, age, pclass2nd, sibsp, parch
Number of terms at each degree of interaction: 1 8 (additive model)
GCV 0.1519922    RSS 153.8581    GRSq 0.3720351    RSq 0.3911174
> 
> # custom metric
> twoClassSummaryPlus <- function (data,
+                         lev = NULL,
+                         model = NULL)
+ 
+ {
+   out1 <- twoClassSummary(data, lev, model)
+   out2 <- defaultSummary(data, lev, model)
+   #browser() # debug
+   #print(out1)
+   #print(dim(data))
+   c(out1, out2)
+ }
> 
> 
> # tne
> train_earth <- function(nprune)
+ {
+ # prepare tuning parameters
+ grid <- expand.grid(.degree=c(1), .nprune=nprune)
+ 
+ trControl<- trainControl(summaryFunction = twoClassSummaryPlus,
+         classProbs = T,
+   verboseIter=T)
+ 
+ # tune
+ mydata.best <- train(survived ~ .,
+ data = mydata,
+ method = "earth",
+ trControl = trControl,
+ metric="Sens",
+ tuneGrid=grid)
+ 
+ # show tuned
+ print(mydata.best)
+ }
> 
> train_earth(c(1:9)) # ROC is constant 
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Fitting: degree=1, nprune=9
Aggregating results
Selecting tuning parameters
Fitting model on full training set
1046 samples
   6 predictors
   2 classes: 'F', 'T' 

No pre-processing
Resampling: Bootstrap (25 reps) 

Summary of sample sizes: 1046, 1046, 1046, 1046, 1046, 1046, ... 

Resampling results across tuning parameters:

  nprune  ROC    Sens   Spec   Accuracy  Kappa  ROC SD  Sens SD  Spec SD  
Accuracy SD  Kappa SD
  1       0.843  1      0      0.588     0      0.0154  0        0        
0.0239       0       
  2       0.843  0.845  0.684  0.779     0.537  0.0154  0.0209   0.0318   
0.0191       0.0393  
  3       0.843  0.845  0.685  0.779     0.537  0.0154  0.0217   0.0326   
0.0191       0.0392  
  4       0.843  0.846  0.694  0.784     0.547  0.0154  0.0232   0.0343   
0.0203       0.0412  
  5       0.843  0.842  0.714  0.789     0.561  0.0154  0.0236   0.0344   
0.0184       0.037   
  6       0.843  0.848  0.718  0.794     0.57   0.0154  0.0222   0.0349   
0.0182       0.0367  
  7       0.843  0.84   0.727  0.793     0.57   0.0154  0.0279   0.0357   
0.0163       0.0324  
  8       0.843  0.84   0.723  0.792     0.567  0.0154  0.0276   0.0375   
0.0161       0.0317  
  9       0.843  0.84   0.721  0.791     0.565  0.0154  0.026    0.0389   
0.0161       0.0322  

Tuning parameter 'degree' was held constant at a value of 1
Sens was used to select the optimal model using  the largest value.
The final values used for the model were degree = 1 and nprune = 1. 
There were 15 warnings (use warnings() to see them)
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] bug calculating ROC with caret and earth?

Reply via email to