Here is my solution using figures which are self-explanatory:

Sample Size Determination

pi = 50%                                                  central area 0.99
confid level= 99%                                         2 tail area 0.5
sampling error 2%                                      1 tail area 0.025
z =2.58
n1      4,146.82  Excel function for determining central interval
NORMSINV($B$10+(1-$B$10)/2)
n          4,147

The algebraic formula for n was:   n = ?(1-?)*(z/e)2



If you can't read the above:

      n = pi(1-pi)*(z/e)^2

      Let me know if this makes sense.



It is simply amazing to me that you can do a random sample of 4,147 people
out of 50 million and get a valid answer. What is the reason for taking
mulitple samples of the same n - to achieve more accuracy?  Is there a rule
of thumb on how many repetitions of the same sample you would take?



"John Jackson" <[EMAIL PROTECTED]> wrote in message
s1ot7.61225$[EMAIL PROTECTED]">news:s1ot7.61225$[EMAIL PROTECTED]...
> Donald - Thank you for your cogent explanation of a concept that is a bit
> hard to grasp.
> After researching it more, I determined that there is a gaping hole in my
> knowldege relating to the area of inferences on a population proportion so
I
> am somethat admittedly in the dark and have to study up a bit.
>
> Having said that, here are some answers to ?s you posed and some
additional
> comments.
>
> Instead of a warehouse full of CDs, lets work w/a much larger population.
>
> Revised fact pattern:
>
> Suppose you want to estimate the % of voters who acutally  voted in the
2000
> U.S. Presidential election who failed to make a choice for any candidate
> (blank ballot).  Assume (forgetting about politics) that this was simply a
> matter of inadvertance, error on the part of the voter, that all voting
> machines worked properly, and that the problem manifested itself the same
> way all over the country. You want to estimate how many ballots were blank
> and be 98% confident that the error of estimate is 2% or less. So you have
a
> universe of 50m voters or however many went to the polls. Assume you don't
> really know if its is 50m or 75m or 100m. You just know its in the tens of
> millions.
>
> So you want to estimate the proportion of blank ballots, knowing that a
huge
> number of people went to the polls.  You mention and I see it stated in
some
> books that when you don't know the SD and don't know the exact population
> size, other than that is in the millions, the safest choice is p = .5 -
that
> apparently is a sort of worse case scenario it seems......... I have to
> reread my material and also revisit the binomial distribution area which I
> have studied extensively. However that knowledge has been pushed out of
the
> way by this complex area of sampling.
>
> Anyway, if you have some further thoughts given my clarification, I would
> welcome your insights.
>
>
> "Donald Burrill" <[EMAIL PROTECTED]> wrote in message
> [EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> > On Fri, 28 Sep 2001, John Jackson wrote in part:
> >
> > > My formula is a rearrangement of the confidence interval formula shown
> > > below for ascertaining the maximum error.
> > E = Z(a/2) x SD/SQRT N
> > > The issue is you want to solve for N, but you have no standard
> > > deviation value.
> > Oh, but you do.  In the problem you formulated, unless I
> > misunderstood egregiously, you are seeking to estimate the proportion of
> > defective (or pirated, or whatever) CDs in a universe of 10,000 CDs.
> > There is then a maximum value for the SD of a proportion:
> > SD = SQRT[p(1-p)/n]
> > where  p  is the proportion in question,  n  is the sample size.
> > This value is maximized for  p = 0.5  (and it doesn't change much
> > between  p = 0.3  and  p = 0.7 ).  If you have a guess as to the value
> > of  p,  you can get a smaller value of  SD,  but using  p = 0.5  will
> > give you a conservative estimate.
> > You then have to figure out what that "5% error" means:  it might
> > mean "+/- 0.05 on the estimated proportion p" (but this is probably not
a
> > useful error bound if, say, p = 0.03), or it might mean "5% of the
> > estimated proportion" (which would mean +/- 0.0015 if p = 0.03).
> > (In the latter case, E is a function of p, so the formula for n
> > can be solved without using a guesstimated value for p until the last
> > step.)
> > Notice that throughout this analysis, you're using the normal
> > distribution as an approximation to the binomial b(n,p;k) distribution
> > that presumably "really" applies.  That's probably reasonable;  but the
> > approximation may be quite lousy if  p  is very close to 0 (or 1).
> > Thbe thing is, of course, that if there is NO pirating of the CDs, p=0,
> > and this is a desirable state of affairs from your clients' perspective.
> > So you might want to be in the business of expressing the minimum  p
> > that you could expect to detect with, say, 80% probability, using the
> > sample size eventually chosen:  that is, to report a power analysis.
> >
> > > The formula then translates into n = (Z(a/2)*SD)/E)^2
> > > Note: ^2 stands for squared.
> > >
> > > You have only the confidence interval, let's say 95% and E of 1%.
> > > Let's say that you want to find out how many people in the US have
> > > fake driver's licenses using these numbers.  How large (N) must your
> > > sample be?
> >
> > Again, you're essentially trying to estimate a proportion.  (If it is
> > the number of instances that is of interest, the distribution is still
> > inherently binomial, but instead of  p  you're estimating  np,  with
> > SD = SQRT[np(1-p)]
> >  and you still have to decide whether that 1% means "+/- 0.01 on the
> > proportion p" or "1% of the value of np".
> > -- DFB.
>
  ------------------------------------------------------------------------
> >  Donald F. Burrill
[EMAIL PROTECTED]
> >  184 Nashua Road, Bedford, NH 03110
603-471-7128
> >
> >
> >
> > =================================================================
> > Instructions for joining and leaving this list and remarks about
> > the problem of INAPPROPRIATE MESSAGES are available at
> >                   http://jse.stat.ncsu.edu/
> > =================================================================
>
>




=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to