Hi Nicholas, What you're doing seems proper to me (but see below regarding calculation of the SE).
Rarefaction error windows should coalesce around the endpoint as you get to larger randomized samples. And you only have limited numbers of richness values to draw from. You can confirm this behavior artificially with: rA <- rrfy(A,n=2000) A major benefit of resampling techniques is that they are much less sensitive to distributional shape. So I don't see any reason to assume normality when calculating the confidence interval using the 1.96*SE. And note that when using bootstrap methods [see Efron and Tibshirani 1997], the SD of the resampling distribution is approximately equal to the standard error. So you should *not* divide by the sqrt of sample size in the following lines: low[i] <- mean(rs)-1.96*(sd(rs)/sqrt(length(rs))) high[i] <- mean(rs)+1.96*(sd(rs)/sqrt(length(rs))) Instead, I would recommend using the naive 2.5% and 97.5% quantiles to calculate the confidence interval [or just use mean +- 1.96*sd(rs)]: low[i] <- quantile(rs, type=3, prob=0.025) high[i] <- quantile(rs, type=3, prob=0.975) Because your data are discrete integers, the use of these quantiles with the discrete median as your sample statistic might actually be preferable: s[i] <- median(rs) But with discrete data and relatively few richness values, increasing your reps to 2000 or more is probably warranted. (The function alarm() is useful for signalling the end of a routine.) Although focused more on fossils, here's a potentially useful reference, with a link to an appendix with R code and examples: Kowalewski, M., and P. Novack-Gottshall. 2010. Resampling methods in paleontology. Pp. 19-54. In J. Alroy, and G. Hunt, eds. Quantitative Methods in Paleobiology. Short Courses in Paleontology 16. Paleontological Society and Paleontological Research Institute, Ithaca, NY. Cheers, Phil On 5/28/2014 12:05 PM, Nicholas J. Negovetich wrote: I have a question related to rarefaction of our samples. Unlike all of the examples of which I'm aware, I'm not working with 'sites', per se. Instead, I'm working with the parasites of snails. Snails are infected with 1 parasite species, and each pond/stream can hold several parasite species. The difficulty comes when comparing parasite richness between sites (Pond A vs Pond B) and sampling effort (# snails collected) differs. Rarefaction provides a means to standardize effort on the lowest sample size so that we can test if parasite richness differs between sites. I'm familiar with the vegan package and how it performs rarefaction. But, I can't perform this type of analysis because (1.) 'uninfected' snails would be considered empty patches, and (2.) max richness will be 1 (only one parasite species can (usually) infect a snail). To counter this problem, I've tried to rarefy my samples using randomization methods. The script is below. In summary, I sampled (with replacement) my dataset and calculated the number of parasite species from each sample. I repeated this 100 times and calculated the mean richness for a given sample size. I did this for sample sizes 1-50 (smallest sample size is 50 individuals). I have a few questions related this this analysis. 1. Does this appear correct? 2. How can I generate CI using this method? I normally calculate the 2.5% and 97.5% quantiles when performing randomization, but the nature of our data does not lend itself well to quantile calculations. Could I assume that my bootstrapped means follow a normal distribution? If not, then what would be the best method for generating confidence intervals? 3. This last one is the primary reason for this post. When performing this analysis on my real datasets, I've noticed a peculiar thing. In one sample, my variance of the bootstrapped samples converges to zero as sample size approaches the true sample size (this is for my sample of the smallest sample size). Relatedly, I've noticed some (but not all) of my CI narrowing as my sample size approaches the actual size. This suggests that I'm not doing something correctly, but I'm not sure how to correct it (or if I even need to correct it). The alternative for me is to scrap the rarefaction and perform an alternative analysis, such as randomizing the sites and comparing the actual difference in richness between sites to a null distribution of randomized differences. Thoughts? NN #Script below #Two sites: Site A = 100 snails; Site B = 50 snails #0 = uninfected; 1-4 = species 1-4 A <- c(rep(0,56),rep(1,25),rep(2,15),rep(3,3),rep(4,1)) B <- c(rep(0,29),rep(1,12),rep(2,7),rep(3,2)) #Function to generate a random sample with replacement and calculate species richness bsS <- function(x,n=100){ rs <- sample(x,size=n,replace=T) S <- sum(table(rs)[-1]>0) return(S) } #Function to rarefy my dataset; utilized bsS function #For sample sizes of 1:n, generate 'reps' number of bootstrapped richness values (bsS) #and save the mean richness for each sample. rrfy <- function(x,n=10,reps=100){ smax <- sum(table(x)[-1]>0) s <- c() std <- c() low <- c() high <- c() for (i in 1:n){ rs <- c() for(j in 1:reps){ rs[j] <- bsS(x,i) } s[i] <- mean(rs) std[i] <- sd(rs) low[i] <- mean(rs)-1.96*(sd(rs)/sqrt(length(rs))) #I don't think this is correct high[i] <- mean(rs)+1.96*(sd(rs)/sqrt(length(rs))) #I don't think this is correct } plot(s~c(1:n),type='l',ylim=c(0,smax+1)) lines(low~c(1:n),lty=2,col='red') lines(high~c(1:n),lty=2,col='red') return(data.frame(x=c(1:n),s,stdev=std)) } rA <- rrfy(A,n=50) rB <- rrfy(B,n=50) _______________________________________________ R-sig-ecology mailing list R-sig-ecology@r-project.org<mailto:R-sig-ecology@r-project.org> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Phil Novack-Gottshall Associate Professor Department of Biological Sciences Benedictine University 5700 College Road Lisle, IL 60532 pnovack-gottsh...@ben.edu<mailto:pnovack-gottsh...@ben.edu> Phone: 630-829-6514 Fax: 630-829-6547 Office: 332 Birck Hall Lab: 107 Birck Hall http://www1.ben.edu/faculty/pnovack-gottshall ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [[alternative HTML version deleted]] _______________________________________________ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology