On Fri, 28 Sep 2001, John Jackson wrote in part: > My formula is a rearrangement of the confidence interval formula shown > below for ascertaining the maximum error. E = Z(a/2) x SD/SQRT N > The issue is you want to solve for N, but you have no standard > deviation value. Oh, but you do. In the problem you formulated, unless I misunderstood egregiously, you are seeking to estimate the proportion of defective (or pirated, or whatever) CDs in a universe of 10,000 CDs. There is then a maximum value for the SD of a proportion: SD = SQRT[p(1-p)/n] where p is the proportion in question, n is the sample size. This value is maximized for p = 0.5 (and it doesn't change much between p = 0.3 and p = 0.7 ). If you have a guess as to the value of p, you can get a smaller value of SD, but using p = 0.5 will give you a conservative estimate. You then have to figure out what that "5% error" means: it might mean "+/- 0.05 on the estimated proportion p" (but this is probably not a useful error bound if, say, p = 0.03), or it might mean "5% of the estimated proportion" (which would mean +/- 0.0015 if p = 0.03). (In the latter case, E is a function of p, so the formula for n can be solved without using a guesstimated value for p until the last step.) Notice that throughout this analysis, you're using the normal distribution as an approximation to the binomial b(n,p;k) distribution that presumably "really" applies. That's probably reasonable; but the approximation may be quite lousy if p is very close to 0 (or 1). Thbe thing is, of course, that if there is NO pirating of the CDs, p=0, and this is a desirable state of affairs from your clients' perspective. So you might want to be in the business of expressing the minimum p that you could expect to detect with, say, 80% probability, using the sample size eventually chosen: that is, to report a power analysis.
> The formula then translates into n = (Z(a/2)*SD)/E)^2 > Note: ^2 stands for squared. > > You have only the confidence interval, let's say 95% and E of 1%. > Let's say that you want to find out how many people in the US have > fake driver's licenses using these numbers. How large (N) must your > sample be? Again, you're essentially trying to estimate a proportion. (If it is the number of instances that is of interest, the distribution is still inherently binomial, but instead of p you're estimating np, with SD = SQRT[np(1-p)] and you still have to decide whether that 1% means "+/- 0.01 on the proportion p" or "1% of the value of np". -- DFB. ------------------------------------------------------------------------ Donald F. Burrill [EMAIL PROTECTED] 184 Nashua Road, Bedford, NH 03110 603-471-7128 ================================================================= Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =================================================================