On Sun, 30 Sep 2001 00:34:40 GMT, "John Jackson" <[EMAIL PROTECTED]> wrote:
> Here is my solution using figures which are self-explanatory: > > Sample Size Determination > > pi = 50% central area 0.99 > confid level= 99% 2 tail area 0.5 > sampling error 2% 1 tail area 0.025 > z =2.58 > n1 4,146.82 Excel function for determining central interval > NORMSINV($B$10+(1-$B$10)/2) > n 4,147 > > The algebraic formula for n was: n = ?(1-?)*(z/e)2 > > > > If you can't read the above: > > n = pi(1-pi)*(z/e)^2 > > Let me know if this makes sense. > > > > It is simply amazing to me that you can do a random sample of 4,147 people > out of 50 million and get a valid answer. What is the reason for taking > mulitple samples of the same n - to achieve more accuracy? Is there a rule > of thumb on how many repetitions of the same sample you would take? > I have not followed your steps in detail, but: I think you just took a random sample to show that the number of ballots left blank, intentionally, is 1%, plus or minus 2 points. That is using a crude, generous estimate of the variance instead of conditioning on the small p. - A three-fold estimate (over the mean) for the maximum is not good accuracy. - When the minimum estimate of p goes negative, it is time to try an estimation based on something different. If I want an accurate estimate of a rare percentage, I often find it easier to think of the number-of-instances. One percent of 4000 is 40. What is the accuracy with 40 seen in the sample? (95% CI is wider than 30 to 50, but not by a whole lot.) -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html ================================================================= Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =================================================================