> On 8 Sep 2016, at 16:25, David L Carlson <dcarl...@tamu.edu> wrote: > > Sampling without replacement treats the sample as the population for the > purposes of estimating the outcomes at smaller sample sizes. Sampling with > replacement (the same as bootstrapping) treats the sample as one possible > outcome of a larger population at that sample size.
But the resamples aren't actually independent samples from the underlying population, and in contrast to the usual applications of bootstrapping they don't give a good approximation of independent samples if you look at type ("species") counts. In my understanding – which may be incomplete – bootstrapping works for a test statistic computed from the measurements of a single numeric random variable (or perhaps several r.v.) in an i.i.d. sample. The type count cannot be expressed as such a test statistic, hence we get the underestimation bias from sampling with replacement. In NLP, we often use parametric power-law models of the population in order to extrapolate type counts (e.g. using this implementation http://zipfr.r-forge.r-project.org), but this implies strong (and often inappropriate) assumptions about the population. Best, Stefan ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.