On 09/26/2010 10:29 AM, Rainer M Krug wrote: > Hi Peter, H Berwin, > > thanks a lot for your clarifications, it makes more sense now. But > having our input and thinking a little bit more about the problem, I > realized that I am simply interested in the pdf p(y) that y *number* of > entities (which ones is irrelevant) in N are are *not* drawn after the > sampling process has been completed. Even simpler (I guess), in a first > step, I would only need the mean number of expected non-drawn entities > in N (pMean). > > The range of my values: > N is in the range of 1 --- 100 000 > x is in the range of 10 --- 40 000 000 > n is in the range of 20 > > I guess in cases where n*x is substantially smaller then N, I could > simply use a binominal distribution for n*x samples to approximate it -- > right? > For cases where n*x is substantially bigger then N, I can safely > (especially in the context of my simulation) assume that all entities in > N are drawn at least once. > > But what about the range in between?
As long as you are only looking for the mean, I think it is easy: The probability that a particular entity is not sampled in x trials is ((N-n)/N)^x and the mean number of such entities is just N times as much. The variance is slightly harder, but not excessively so (read: I know that you start by working out the probabilities in the 2x2 tables of the joint distribution of two indicators for an entity never being sampled, use this to get the covariance, etc., I just haven't actually done it.) -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.