On 09/25/2010 04:24 PM, Rainer M Krug wrote: > Hi > > This is OT, but I need it for my simulation in R. > > I have a special case for sampling with replacement: instead of sampling > once and replacing it immediately, I sample n times, and then replace all n > items. > > > So: > > N entities > x samples with replacement > each sample consists of n sub-samples WITHOUT replacement, which are all > replaced before the next sample is drawn > > My question is: which distribution can I use to describe how often each > entity of the N has been sampled? > > Thanks for your help, > > Rainer >
How did you know I was in the middle of preparing lectures on the variance of the hypergeometric distribution and such? ;-) If you look at a single item, the answer is of course that you have a binomial with size=x and prob=n/N. The problem is that these binomials are correlated between items. If you can make do with a 2nd order approximation, then the covariances between the indicators for two items being selected is easily found from the symmetry and the fact that if you sum all N indicators you get the constant n. I.e. the variance is p(1-p) and the covariance is -p(1-p)/(N-1). For sums over repeated samples, just multiply everything by the number x of samples. If you intend to just count the frequency of a particular feature in each of your n-samples, i.e., you have x replications of a hypergeometric experiment, then you can do somewhat better by computing the explicit convolution of x hypergeometrics (convolve(x, rev(y), type="o") and Reduce() are your friends). I'm not sure this is actually worth the trouble, but it should be doable for decent-sized N and x. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.