Re: E as a % of a standard deviation
Donald, I totally agree w/your point about the stratification of the sample. My facts were set up merely for simplicity's sake notwithstanding their clear artificiality. The only instances of multiple samples I have seen are in textbooks to prove the CLT; that w/increasing numbers of sample means, the distribution (of sample means) becomes normal even if the population isn't. Statistics is a relatively new area study for me and I never would have intuitively thought that a sample of a few thousand could reveal such meaningful results. But I understand your point completely. I suppose like you say that when you factor in stratification and clustering, it isn't such a no brainer as in my example. Thank you again for enlightening me. Donald Burrill [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... On Sun, 30 Sep 2001, John Jackson wrote: Here is my solution using figures which are self-explanatory: Sample Size Determination pi = 50% central area 0.99 confid level= 99% 2 tail area 0.5 sampling error 2% 1 tail area 0.025 z =2.58 n1 4,146.82 Excel function for determining central interval NORMSINV($B$10+(1-$B$10)/2) n 4,147 The algebraic formula for n was: n = pi(1-pi)*(z/e)^2 It is simply amazing to me that you can do a random sample of 4,147 people out of 50 million and get a valid answer. It is not clear what part of this you find amazing. (Would you otherwise expect an INvalid answer, in some sense?) Thme hard part, of course, is taking the random sample in the first place. The equation you used, I believe, assumes a simple random sample, sometimes known in the trade as a SRS; but it seems to me VERY unlikely that any real sampling among the ballots cast in a national election would be done that way. I'd expect it to involve stratifying on (e.g.) states, and possibly clustering within states; both of which would affect the precision of the estimate, and therefore the minimum sample size desired. As to what may be your concern, that 4,000 looks like a small part of 50 million, the precision of an estimate depends principally on the amount of information available -- that is, on the size of the sample; not on the proportion that amount bears to the total amount of information that may be of interest. Rather like a hologram, in some respects; and very like the resolving power of an optical instrument (e.g., a telescope), which is a function of the amount of information the instrument can receive (the area of the primary lens or reflector), not on how far away the object in view may be nor what its absolute magnitude may be. What is the reason for taking multiple samples of the same n - to achieve more accuracy? I, for one, don't understand the point of this question at all. Multiple samples? Who takes them, or advocates taking them? snip, the rest Donald F. Burrill [EMAIL PROTECTED] 184 Nashua Road, Bedford, NH 03110 603-471-7128 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: E as a % of a standard deviation
On Sun, 30 Sep 2001 00:34:40 GMT, John Jackson [EMAIL PROTECTED] wrote: Here is my solution using figures which are self-explanatory: Sample Size Determination pi = 50% central area 0.99 confid level= 99% 2 tail area 0.5 sampling error 2% 1 tail area 0.025 z =2.58 n1 4,146.82 Excel function for determining central interval NORMSINV($B$10+(1-$B$10)/2) n 4,147 The algebraic formula for n was: n = ?(1-?)*(z/e)2 If you can't read the above: n = pi(1-pi)*(z/e)^2 Let me know if this makes sense. It is simply amazing to me that you can do a random sample of 4,147 people out of 50 million and get a valid answer. What is the reason for taking mulitple samples of the same n - to achieve more accuracy? Is there a rule of thumb on how many repetitions of the same sample you would take? I have not followed your steps in detail, but: I think you just took a random sample to show that the number of ballots left blank, intentionally, is 1%, plus or minus 2 points. That is using a crude, generous estimate of the variance instead of conditioning on the small p. - A three-fold estimate (over the mean) for the maximum is not good accuracy. - When the minimum estimate of p goes negative, it is time to try an estimation based on something different. If I want an accurate estimate of a rare percentage, I often find it easier to think of the number-of-instances. One percent of 4000 is 40. What is the accuracy with 40 seen in the sample? (95% CI is wider than 30 to 50, but not by a whole lot.) -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Help with Minitab Problem?
Turns out the method I originally suggested is unnecessarily cumbersome. A more elegant method is described below. On Sat, 29 Sep 2001, Donald Burrill wrote in part: COPY c1-c35 to c41-c75; # Always retain the original data OMIT c1 = '*'; OMIT c2 = '*'; . . . ; OMIT c35 = '*'. There is probably a limit on the number of subcommands that MINITAB can handle (or on the number of OMIT subcommands that COPY can handle), but I don't know offhand what it is. Well, the limit is one: only one OMIT subcommand per COPY command. That makes this procedure distinctly tedious, for 35 columns. A more efficient method: ADD c1-c35 c36 This puts the sum of c1-c35 in c36, but if any one (or more) of c1-c35 are missing, the result is missing: so c36 has '*' for every row where there is a missing datum in some column(s). A reasonable next step is to see how much data is left: N c36 reports the number of non-missing values in c36. If that value is zero, or some other very small number, you might want to re-think your strategy before proceeding: COPY c1-c35 c41-c75; OMIT c36 '*'. Columns c41-c75 now contain only rows of the original c1-c35 for which all of the values are NON-missing. snip, the rest -- DFB. Donald F. Burrill [EMAIL PROTECTED] 184 Nashua Road, Bedford, NH 03110 603-471-7128 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =