Donald - Thank you for your cogent explanation of a concept that is a bit hard to grasp. After researching it more, I determined that there is a gaping hole in my knowldege relating to the area of inferences on a population proportion so I am somethat admittedly in the dark and have to study up a bit.
Having said that, here are some answers to ?s you posed and some additional comments. Instead of a warehouse full of CDs, lets work w/a much larger population. Revised fact pattern: Suppose you want to estimate the % of voters who acutally voted in the 2000 U.S. Presidential election who failed to make a choice for any candidate (blank ballot). Assume (forgetting about politics) that this was simply a matter of inadvertance, error on the part of the voter, that all voting machines worked properly, and that the problem manifested itself the same way all over the country. You want to estimate how many ballots were blank and be 98% confident that the error of estimate is 2% or less. So you have a universe of 50m voters or however many went to the polls. Assume you don't really know if its is 50m or 75m or 100m. You just know its in the tens of millions. So you want to estimate the proportion of blank ballots, knowing that a huge number of people went to the polls. You mention and I see it stated in some books that when you don't know the SD and don't know the exact population size, other than that is in the millions, the safest choice is p = .5 - that apparently is a sort of worse case scenario it seems......... I have to reread my material and also revisit the binomial distribution area which I have studied extensively. However that knowledge has been pushed out of the way by this complex area of sampling. Anyway, if you have some further thoughts given my clarification, I would welcome your insights. "Donald Burrill" <[EMAIL PROTECTED]> wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > On Fri, 28 Sep 2001, John Jackson wrote in part: > > > My formula is a rearrangement of the confidence interval formula shown > > below for ascertaining the maximum error. > E = Z(a/2) x SD/SQRT N > > The issue is you want to solve for N, but you have no standard > > deviation value. > Oh, but you do. In the problem you formulated, unless I > misunderstood egregiously, you are seeking to estimate the proportion of > defective (or pirated, or whatever) CDs in a universe of 10,000 CDs. > There is then a maximum value for the SD of a proportion: > SD = SQRT[p(1-p)/n] > where p is the proportion in question, n is the sample size. > This value is maximized for p = 0.5 (and it doesn't change much > between p = 0.3 and p = 0.7 ). If you have a guess as to the value > of p, you can get a smaller value of SD, but using p = 0.5 will > give you a conservative estimate. > You then have to figure out what that "5% error" means: it might > mean "+/- 0.05 on the estimated proportion p" (but this is probably not a > useful error bound if, say, p = 0.03), or it might mean "5% of the > estimated proportion" (which would mean +/- 0.0015 if p = 0.03). > (In the latter case, E is a function of p, so the formula for n > can be solved without using a guesstimated value for p until the last > step.) > Notice that throughout this analysis, you're using the normal > distribution as an approximation to the binomial b(n,p;k) distribution > that presumably "really" applies. That's probably reasonable; but the > approximation may be quite lousy if p is very close to 0 (or 1). > Thbe thing is, of course, that if there is NO pirating of the CDs, p=0, > and this is a desirable state of affairs from your clients' perspective. > So you might want to be in the business of expressing the minimum p > that you could expect to detect with, say, 80% probability, using the > sample size eventually chosen: that is, to report a power analysis. > > > The formula then translates into n = (Z(a/2)*SD)/E)^2 > > Note: ^2 stands for squared. > > > > You have only the confidence interval, let's say 95% and E of 1%. > > Let's say that you want to find out how many people in the US have > > fake driver's licenses using these numbers. How large (N) must your > > sample be? > > Again, you're essentially trying to estimate a proportion. (If it is > the number of instances that is of interest, the distribution is still > inherently binomial, but instead of p you're estimating np, with > SD = SQRT[np(1-p)] > and you still have to decide whether that 1% means "+/- 0.01 on the > proportion p" or "1% of the value of np". > -- DFB. > ------------------------------------------------------------------------ > Donald F. Burrill [EMAIL PROTECTED] > 184 Nashua Road, Bedford, NH 03110 603-471-7128 > > > > ================================================================= > Instructions for joining and leaving this list and remarks about > the problem of INAPPROPRIATE MESSAGES are available at > http://jse.stat.ncsu.edu/ > ================================================================= ================================================================= Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =================================================================