Donald - Thank you for your cogent explanation of a concept that is a bit
hard to grasp.
After researching it more, I determined that there is a gaping hole in my
knowldege relating to the area of inferences on a population proportion so I
am somethat admittedly in the dark and have to study up a bit.

Having said that, here are some answers to ?s you posed and some additional
comments.

Instead of a warehouse full of CDs, lets work w/a much larger population.

Revised fact pattern:

Suppose you want to estimate the % of voters who acutally  voted in the 2000
U.S. Presidential election who failed to make a choice for any candidate
(blank ballot).  Assume (forgetting about politics) that this was simply a
matter of inadvertance, error on the part of the voter, that all voting
machines worked properly, and that the problem manifested itself the same
way all over the country. You want to estimate how many ballots were blank
and be 98% confident that the error of estimate is 2% or less. So you have a
universe of 50m voters or however many went to the polls. Assume you don't
really know if its is 50m or 75m or 100m. You just know its in the tens of
millions.

So you want to estimate the proportion of blank ballots, knowing that a huge
number of people went to the polls.  You mention and I see it stated in some
books that when you don't know the SD and don't know the exact population
size, other than that is in the millions, the safest choice is p = .5 - that
apparently is a sort of worse case scenario it seems......... I have to
reread my material and also revisit the binomial distribution area which I
have studied extensively. However that knowledge has been pushed out of the
way by this complex area of sampling.

Anyway, if you have some further thoughts given my clarification, I would
welcome your insights.


"Donald Burrill" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> On Fri, 28 Sep 2001, John Jackson wrote in part:
>
> > My formula is a rearrangement of the confidence interval formula shown
> > below for ascertaining the maximum error.
> E = Z(a/2) x SD/SQRT N
> > The issue is you want to solve for N, but you have no standard
> > deviation value.
> Oh, but you do.  In the problem you formulated, unless I
> misunderstood egregiously, you are seeking to estimate the proportion of
> defective (or pirated, or whatever) CDs in a universe of 10,000 CDs.
> There is then a maximum value for the SD of a proportion:
> SD = SQRT[p(1-p)/n]
> where  p  is the proportion in question,  n  is the sample size.
> This value is maximized for  p = 0.5  (and it doesn't change much
> between  p = 0.3  and  p = 0.7 ).  If you have a guess as to the value
> of  p,  you can get a smaller value of  SD,  but using  p = 0.5  will
> give you a conservative estimate.
> You then have to figure out what that "5% error" means:  it might
> mean "+/- 0.05 on the estimated proportion p" (but this is probably not a
> useful error bound if, say, p = 0.03), or it might mean "5% of the
> estimated proportion" (which would mean +/- 0.0015 if p = 0.03).
> (In the latter case, E is a function of p, so the formula for n
> can be solved without using a guesstimated value for p until the last
> step.)
> Notice that throughout this analysis, you're using the normal
> distribution as an approximation to the binomial b(n,p;k) distribution
> that presumably "really" applies.  That's probably reasonable;  but the
> approximation may be quite lousy if  p  is very close to 0 (or 1).
> Thbe thing is, of course, that if there is NO pirating of the CDs, p=0,
> and this is a desirable state of affairs from your clients' perspective.
> So you might want to be in the business of expressing the minimum  p
> that you could expect to detect with, say, 80% probability, using the
> sample size eventually chosen:  that is, to report a power analysis.
>
> > The formula then translates into n = (Z(a/2)*SD)/E)^2
> > Note: ^2 stands for squared.
> >
> > You have only the confidence interval, let's say 95% and E of 1%.
> > Let's say that you want to find out how many people in the US have
> > fake driver's licenses using these numbers.  How large (N) must your
> > sample be?
>
> Again, you're essentially trying to estimate a proportion.  (If it is
> the number of instances that is of interest, the distribution is still
> inherently binomial, but instead of  p  you're estimating  np,  with
> SD = SQRT[np(1-p)]
>  and you still have to decide whether that 1% means "+/- 0.01 on the
> proportion p" or "1% of the value of np".
> -- DFB.
>  ------------------------------------------------------------------------
>  Donald F. Burrill                                 [EMAIL PROTECTED]
>  184 Nashua Road, Bedford, NH 03110                          603-471-7128
>
>
>
> =================================================================
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>                   http://jse.stat.ncsu.edu/
> =================================================================




=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to