Here is my solution using figures which are self-explanatory: Sample Size Determination
pi = 50% central area 0.99 confid level= 99% 2 tail area 0.5 sampling error 2% 1 tail area 0.025 z =2.58 n1 4,146.82 Excel function for determining central interval NORMSINV($B$10+(1-$B$10)/2) n 4,147 The algebraic formula for n was: n = ?(1-?)*(z/e)2 If you can't read the above: n = pi(1-pi)*(z/e)^2 Let me know if this makes sense. It is simply amazing to me that you can do a random sample of 4,147 people out of 50 million and get a valid answer. What is the reason for taking mulitple samples of the same n - to achieve more accuracy? Is there a rule of thumb on how many repetitions of the same sample you would take? "John Jackson" <[EMAIL PROTECTED]> wrote in message s1ot7.61225$[EMAIL PROTECTED]">news:s1ot7.61225$[EMAIL PROTECTED]... > Donald - Thank you for your cogent explanation of a concept that is a bit > hard to grasp. > After researching it more, I determined that there is a gaping hole in my > knowldege relating to the area of inferences on a population proportion so I > am somethat admittedly in the dark and have to study up a bit. > > Having said that, here are some answers to ?s you posed and some additional > comments. > > Instead of a warehouse full of CDs, lets work w/a much larger population. > > Revised fact pattern: > > Suppose you want to estimate the % of voters who acutally voted in the 2000 > U.S. Presidential election who failed to make a choice for any candidate > (blank ballot). Assume (forgetting about politics) that this was simply a > matter of inadvertance, error on the part of the voter, that all voting > machines worked properly, and that the problem manifested itself the same > way all over the country. You want to estimate how many ballots were blank > and be 98% confident that the error of estimate is 2% or less. So you have a > universe of 50m voters or however many went to the polls. Assume you don't > really know if its is 50m or 75m or 100m. You just know its in the tens of > millions. > > So you want to estimate the proportion of blank ballots, knowing that a huge > number of people went to the polls. You mention and I see it stated in some > books that when you don't know the SD and don't know the exact population > size, other than that is in the millions, the safest choice is p = .5 - that > apparently is a sort of worse case scenario it seems......... I have to > reread my material and also revisit the binomial distribution area which I > have studied extensively. However that knowledge has been pushed out of the > way by this complex area of sampling. > > Anyway, if you have some further thoughts given my clarification, I would > welcome your insights. > > > "Donald Burrill" <[EMAIL PROTECTED]> wrote in message > [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > On Fri, 28 Sep 2001, John Jackson wrote in part: > > > > > My formula is a rearrangement of the confidence interval formula shown > > > below for ascertaining the maximum error. > > E = Z(a/2) x SD/SQRT N > > > The issue is you want to solve for N, but you have no standard > > > deviation value. > > Oh, but you do. In the problem you formulated, unless I > > misunderstood egregiously, you are seeking to estimate the proportion of > > defective (or pirated, or whatever) CDs in a universe of 10,000 CDs. > > There is then a maximum value for the SD of a proportion: > > SD = SQRT[p(1-p)/n] > > where p is the proportion in question, n is the sample size. > > This value is maximized for p = 0.5 (and it doesn't change much > > between p = 0.3 and p = 0.7 ). If you have a guess as to the value > > of p, you can get a smaller value of SD, but using p = 0.5 will > > give you a conservative estimate. > > You then have to figure out what that "5% error" means: it might > > mean "+/- 0.05 on the estimated proportion p" (but this is probably not a > > useful error bound if, say, p = 0.03), or it might mean "5% of the > > estimated proportion" (which would mean +/- 0.0015 if p = 0.03). > > (In the latter case, E is a function of p, so the formula for n > > can be solved without using a guesstimated value for p until the last > > step.) > > Notice that throughout this analysis, you're using the normal > > distribution as an approximation to the binomial b(n,p;k) distribution > > that presumably "really" applies. That's probably reasonable; but the > > approximation may be quite lousy if p is very close to 0 (or 1). > > Thbe thing is, of course, that if there is NO pirating of the CDs, p=0, > > and this is a desirable state of affairs from your clients' perspective. > > So you might want to be in the business of expressing the minimum p > > that you could expect to detect with, say, 80% probability, using the > > sample size eventually chosen: that is, to report a power analysis. > > > > > The formula then translates into n = (Z(a/2)*SD)/E)^2 > > > Note: ^2 stands for squared. > > > > > > You have only the confidence interval, let's say 95% and E of 1%. > > > Let's say that you want to find out how many people in the US have > > > fake driver's licenses using these numbers. How large (N) must your > > > sample be? > > > > Again, you're essentially trying to estimate a proportion. (If it is > > the number of instances that is of interest, the distribution is still > > inherently binomial, but instead of p you're estimating np, with > > SD = SQRT[np(1-p)] > > and you still have to decide whether that 1% means "+/- 0.01 on the > > proportion p" or "1% of the value of np". > > -- DFB. > ------------------------------------------------------------------------ > > Donald F. Burrill [EMAIL PROTECTED] > > 184 Nashua Road, Bedford, NH 03110 603-471-7128 > > > > > > > > ================================================================= > > Instructions for joining and leaving this list and remarks about > > the problem of INAPPROPRIATE MESSAGES are available at > > http://jse.stat.ncsu.edu/ > > ================================================================= > > ================================================================= Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =================================================================