Hi,
I'm new to R (and statistics) and my boss has thrown me in the deep-end with the following task: We want to evaluate the impact that sampling size has on our ability to create a robust model, or evaluate how robust the model is to sample size for the purpose of cross-validation i.e. in our current project we have collected a series of independent data at 250 locations, from which we have built a predictive model, we want to know whether we could get away with collecting fewer samples and still build a decent model; for the obvious operational reasons of cost, time spent in the field etc.. Our thinking was that we could apply a bootstrap type procedure: We would remove 10 records or samples from the total n=250 and then replace those 10 removed with replacements (or copies) from the remaining 240. With this new data-frame we would apply our model and calculate an r², we would then repeat through looping 1000 times before generating the mean r² from those 1000 r² values generated. After which we would start the process again by remove 20 samples from our data with replacements from the remaining 230 records and so on... Below is a simplified version of the real code which contains most of the basic elements. My main problem is I'm not sure what the 'for(i in 1:nboot)' line is doing, originally I though what this meant was that it removed 1 sample or record from the data which was replaced by a copy of one of the records from the remaining n, such that 'for(i in 10:nboot)' when used in the context of the below code removed 10 samples with replacements as I have said above. I'm almost positive that this isn't happening and if not how can I make the code below for example do what we want it to? library(utils) #data a <- c(5.5, 2.3, 8.5, 9.1, 8.6, 5.1) b <- c(5.2, 2.2, 8.6, 9.1, 8.8, 5.7) c <- c(5.0,14.6, 8.9, 9.0, 9.1, 5.5) #join abc <- data.frame(a,b,c) #set column names names(abc)[1]<-"y" names(abc)[2]<-"x1" names(abc)[3]<-"x2" abc2 <- abc #sample abc3 <- as.data.frame(t(as.matrix(data.frame(abc2)))) n <- length(abc2) npboot.function <- function(nboot) { boot.cor <- vector(length=nboot) for(i in 1:nboot){ rdata <- sample(abc3,n,replace=T) abc4 <- as.data.frame(t(as.matrix(data.frame(rdata)))) model <- lm(asin(sqrt(abc4$y/100)) ~ I(abc4$x1^2) + abc4$x2) boot.cor[i] <- cor(abc4$y, model$fit)} boot.cor } bt.cor <- npboot.function(nboot=10) bootmean <- mean(bt.cor) Any assistance would be greatly appreciated, also the sooner the better as we are under pressure to reach a conclusion. Cheers, Garth [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.