On Fri, 28 Sep 2001, John Jackson wrote in part:

> My formula is a rearrangement of the confidence interval formula shown 
> below for ascertaining the maximum error.
                E = Z(a/2) x SD/SQRT N
> The issue is you want to solve for N, but you have no standard 
> deviation value.
        Oh, but you do.  In the problem you formulated, unless I 
misunderstood egregiously, you are seeking to estimate the proportion of 
defective (or pirated, or whatever) CDs in a universe of 10,000 CDs. 
There is then a maximum value for the SD of a proportion:  
        SD = SQRT[p(1-p)/n]
where  p  is the proportion in question,  n  is the sample size.
This value is maximized for  p = 0.5  (and it doesn't change much 
between  p = 0.3  and  p = 0.7 ).  If you have a guess as to the value 
of  p,  you can get a smaller value of  SD,  but using  p = 0.5  will 
give you a conservative estimate.
        You then have to figure out what that "5% error" means:  it might 
mean "+/- 0.05 on the estimated proportion p" (but this is probably not a 
useful error bound if, say, p = 0.03), or it might mean "5% of the 
estimated proportion" (which would mean +/- 0.0015 if p = 0.03). 
        (In the latter case, E is a function of p, so the formula for n 
can be solved without using a guesstimated value for p until the last 
step.) 
        Notice that throughout this analysis, you're using the normal 
distribution as an approximation to the binomial b(n,p;k) distribution 
that presumably "really" applies.  That's probably reasonable;  but the 
approximation may be quite lousy if  p  is very close to 0 (or 1).
Thbe thing is, of course, that if there is NO pirating of the CDs, p=0, 
and this is a desirable state of affairs from your clients' perspective. 
So you might want to be in the business of expressing the minimum  p 
that you could expect to detect with, say, 80% probability, using the 
sample size eventually chosen:  that is, to report a power analysis.

> The formula then translates into n = (Z(a/2)*SD)/E)^2   
>       Note: ^2 stands for squared.
> 
> You have only the confidence interval, let's say 95% and E of 1%.  
> Let's say that you want to find out how many people in the US have 
> fake driver's licenses using these numbers.  How large (N) must your 
> sample be?

Again, you're essentially trying to estimate a proportion.  (If it is 
the number of instances that is of interest, the distribution is still 
inherently binomial, but instead of  p  you're estimating  np,  with 
        SD = SQRT[np(1-p)]
 and you still have to decide whether that 1% means "+/- 0.01 on the 
proportion p" or "1% of the value of np".
                                        -- DFB.
 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 184 Nashua Road, Bedford, NH 03110                          603-471-7128



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to