> On 8 Sep 2016, at 16:25, David L Carlson <dcarl...@tamu.edu> wrote:
> 
> Sampling without replacement treats the sample as the population for the 
> purposes of estimating the outcomes at smaller sample sizes. Sampling with 
> replacement (the same as bootstrapping) treats the sample as one possible 
> outcome of a larger population at that sample size. 

But the resamples aren't actually independent samples from the underlying 
population, and in contrast to the usual applications of bootstrapping they 
don't give a good approximation of independent samples if you look at type 
("species") counts.

In my understanding – which may be incomplete – bootstrapping works for a test 
statistic computed from the measurements of a single numeric random variable 
(or perhaps several r.v.) in an i.i.d. sample.  The type count cannot be 
expressed as such a test statistic, hence we get the underestimation bias from 
sampling with replacement.

In NLP, we often use parametric power-law models of the population in order to 
extrapolate type counts (e.g. using this implementation 
http://zipfr.r-forge.r-project.org), but this implies strong (and often 
inappropriate) assumptions about the population.

Best,
Stefan

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to