Dominik, See this line:
> Min. 1st Qu. Median Mean 3rd Qu. Max. > 30.37 30.37 30.37 30.37 30.37 30.37 The variance of the predictions is zero. caret uses the formula for R^2 by calculating the correlation between the observed data and the predictions which uses sd(pred) which is zero. I believe that the same would occur with other formulas for R^2. Max On Wed, May 16, 2012 at 11:54 AM, Dominik Bruhn <domi...@dbruhn.de> wrote: > Thanks Max for your answer. > > First, I do not understand your post. Why is it a problem if two of > predictions match? From the formula for calculating R^2 I can see that > there will be a DivByZero iff the total sum of squares is 0. This is > only true if the predictions of all the predicted points from the > test-set are equal to the mean of the test-set. Why should this happen? > > Anyway, I wrote the following code to check what you tried to tell: > > -- > library(caret) > data(trees) > formula=Volume~Girth+Height > > customSummary <- function (data, lev = NULL, model = NULL) { > print(summary(data$pred)) > return(defaultSummary(data, lev, model)) > } > > tc=trainControl(method='cv', summaryFunction=customSummary) > train(formula, data=trees, method='rpart', trControl=tc) > -- > > This outputs: > --- > Min. 1st Qu. Median Mean 3rd Qu. Max. > 18.45 18.45 18.45 30.12 35.95 53.44 > Min. 1st Qu. Median Mean 3rd Qu. Max. > 22.69 22.69 22.69 32.94 38.06 53.44 > Min. 1st Qu. Median Mean 3rd Qu. Max. > 30.37 30.37 30.37 30.37 30.37 30.37 > [cut many values like this] > Warning: In nominalTrainWorkflow(dat = trainData, info = trainInfo, > method = method, : > There were missing values in resampled performance measures. > ----- > > As I didn't understand your post, I don't know if this confirms your > assumption. > > Thanks anyway, > Dominik > > > On 16/05/12 17:30, Max Kuhn wrote: >> More information is needed to be sure, but it is most likely that some >> of the resampled rpart models produce the same prediction for the >> hold-out samples (likely the result of no viable split being found). >> >> Almost every incarnation of R^2 requires the variance of the >> prediction. This particular failure mode would result in a divide by >> zero. >> >> Try using you own summary function (see ?trainControl) and put a >> print(summary(data$pred)) in there to verify my claim. >> >> Max >> >> On Wed, May 16, 2012 at 11:30 AM, Max Kuhn <mxk...@gmail.com> wrote: >>> More information is needed to be sure, but it is most likely that some >>> of the resampled rpart models produce the same prediction for the >>> hold-out samples (likely the result of no viable split being found). >>> >>> Almost every incarnation of R^2 requires the variance of the >>> prediction. This particular failure mode would result in a divide by >>> zero. >>> >>> Try using you own summary function (see ?trainControl) and put a >>> print(summary(data$pred)) in there to verify my claim. >>> >>> Max >>> >>> On Tue, May 15, 2012 at 5:55 AM, Dominik Bruhn <domi...@dbruhn.de> wrote: >>>> Hy, >>>> I got the following problem when trying to build a rpart model and using >>>> everything but LOOCV. Originally, I wanted to used k-fold partitioning, >>>> but every partitioning except LOOCV throws the following warning: >>>> >>>> ---- >>>> Warning message: In nominalTrainWorkflow(dat = trainData, info = >>>> trainInfo, method = method, : There were missing values in resampled >>>> performance measures. >>>> ----- >>>> >>>> Below are some simplified testcases which repoduce the warning on my >>>> system. >>>> >>>> Question: What does this error mean? How can I avoid it? >>>> >>>> System-Information: >>>> ----- >>>>> sessionInfo() >>>> R version 2.15.0 (2012-03-30) >>>> Platform: x86_64-pc-linux-gnu (64-bit) >>>> >>>> locale: >>>> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >>>> [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 >>>> [7] LC_PAPER=C LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] rpart_3.1-52 caret_5.15-023 foreach_1.4.0 cluster_1.14.2 >>>> reshape_0.8.4 >>>> [6] plyr_1.7.1 lattice_0.20-6 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] codetools_0.2-8 compiler_2.15.0 grid_2.15.0 iterators_1.0.6 >>>> [5] tools_2.15.0 >>>> ------- >>>> >>>> >>>> Simlified Testcase I: Throws warning >>>> --- >>>> library(caret) >>>> data(trees) >>>> formula=Volume~Girth+Height >>>> train(formula, data=trees, method='rpart') >>>> --- >>>> >>>> Simlified Testcase II: Every other CV-method also throws the warning, >>>> for example using 'cv': >>>> --- >>>> library(caret) >>>> data(trees) >>>> formula=Volume~Girth+Height >>>> tc=trainControl(method='cv') >>>> train(formula, data=trees, method='rpart', trControl=tc) >>>> --- >>>> >>>> Simlified Testcase III: The only CV-method which is working is 'LOOCV': >>>> --- >>>> library(caret) >>>> data(trees) >>>> formula=Volume~Girth+Height >>>> tc=trainControl(method='LOOCV') >>>> train(formula, data=trees, method='rpart', trControl=tc) >>>> --- >>>> >>>> >>>> Thanks! >>>> -- >>>> Dominik Bruhn >>>> mailto: domi...@dbruhn.de >>>> >>>> >>>> >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>> >>> -- >>> >>> Max >> >> >> > > > -- > Dominik Bruhn > mailto: domi...@dbruhn.de > -- Max ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.