Re: [R] R help-classification accuracy of DFA and RF using caret

David Winsemius Wed, 06 Nov 2013 13:59:24 -0800

On Nov 6, 2013, at 10:07 AM, Henderson, Robin Michelle wrote:

> Hi,
> 
> I am a graduate student applying published R scripts to compare the 
> classification accuracy of 2 predictive models, one built using discriminant 
> function analysis and one using random forests (webpage link for these 
> scripts is provided below).  The purpose of these models is to predict the 
> biotic integrity of streams.  Specifically, I am trying to compare the 
> classification accuracy (i.e., prediction of group membership)of both the DFA 
> and RF models using k-fold crossvalidation for the following metrics: AUC 
> ROC, percent correctly classified, specificity, sensitivity, and Kappa.


Sensitivity, "accuracy" (= percent correct), and specificity are only defined 
when you establish a particular threshold for decision. The is no "sensitivity" 
or "specificity" that will accrue to a classification model. AUC is an effort 
at presenting such an overall value, but it has deficiencies and is insensitive 
to statistically significant differences in models.

> I would also like to obtain the F statistic, Wilks lambda, MSE or RMSE for 
> the random forest models as the script does not contain code to get this data.

I doubt very much that is by accident or oversight on the part of the 
randomForest developers.

>  I think I need to use the caret package to obtain the classification 
> accuracy, but I keep getting error messages when I apply the train function 
> to my data.  As I am relatively new to R and my thesis committee is unable to 
> help as they are also unfamiliar with R, I thought it best to ask for help.

I think you need to add a statistician to your committee. The difficulties you 
are facing (of which you appear to be unaware) are not just related to being 
new to R.


>  Would someone be willing to help me?
> 
> 
> Thanks,
> Robin
> 
> http://www.epa.gov/wed/pages/models/rivpacs/rivpacs.htm
> 
> 
>> TrainDataDFAgrps2 <-predcal
>> TrainClassesDFAgrps2 <-grp.2;
>> DFAgrps2Fit1 <- train(TrainDataDFAgrps2, TrainClassesDFAgrps2,
> +  method = "lda",
> + tuneLength = 10,
> + trControl = trainControl(method = "cv"));
> Error in train.default(TrainDataDFAgrps2, TrainClassesDFAgrps2, method = 
> "lda",  :
>  wrong model type for regression

That error is pointing out that you are choosing a method that expects a 
particular form of outcome (continuous) and does not accept a categorical 
(possibly an R factor?) outcome. I suspect you may be using the `caret` 
package, but it's unclear. I think this is further evidence of the need for 
competent statistical consultation. You would be advised to study further in 
Venables and Ripley's MASS(v4) or in Hastie, Tibshirani, and Freidmans ESL(v2).

This link, found with a simple google search, suggests that the author of the 
cited code is at an academic institution only one state away from you: 
fw.oregonstate.edu/system/files/Van%20Sickle%20CV%20consult.pdf‎. He may be 
willing to offer assistance.

-- 
David.

> 
>> RFgrps2Fit1 <- train(TrainDataRFgrps2, TrainClassesRFgrps2,
> +  method = "rf",
> + tuneLength = 10,
> + trControl = trainControl(method = "cv"));
> There were 50 or more warnings (use warnings() to see the first 50)
> 
> Clip of predcal (same length as grp.2, but too much data to display all):
>> predcal
>          Reference_Test HUC12_AREA_HA_log10 ELEV_m M_Slp_sqt Precip_mm 
> Temp_CX10
> 2370                   R                 3.7  588.0       2.2      1751       
> 148
> 559                    R                 4.0  643.1       1.8      1674       
> 141
> 2062                   R                 4.0  643.1       1.8      1674       
> 141
> 2467                   R                 4.0  643.1       1.8      1674       
> 141
> 1176                   R                 3.9  694.3       2.4      1534       
> 131
> 1840                   R                 3.9  694.3       2.4      1534       
> 131
> 2052                   R                 3.9  694.3       2.4      1534       
> 131
> 1174                   R                 4.1  605.0       2.1      1382       
> 138
> 1841                   R                 4.1  605.0       2.1      1382       
> 138
> 2051                   R                 4.1  605.0       2.1      1382       
> 138
> 1831                   R                 4.1  363.9       1.7       937       
> 156
> 
> 
> Grps.2:
> grp.2
>  [1] 1 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 
> 2 2 1 2 1 2 1 1
> [45] 2 2 1 1 1 1 1 1 1 2 2 1 1 1 2 2 1 2 2 1 1 1 2 2 2 2 2 2 1 1 1 2 2 2 1 2 
> 2 2 2 2 2 2 2 1
> [89] 1 2 2 2 2 2 1 1 2 2 2 1 2 1 2 2 1 2 1 1 2
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R help-classification accuracy of DFA and RF using caret

Reply via email to