Re: E as a % of a standard deviation

2001-09-30 Thread John Jackson

Donald,

I totally agree w/your point about the stratification of the sample. My
facts were set up merely for simplicity's sake notwithstanding their clear
artificiality.

The only instances of multiple samples I have seen are in textbooks to prove
the CLT; that w/increasing numbers of sample means, the distribution (of
sample means) becomes normal even if the population isn't. Statistics is a
relatively new area study for me and  I never would have intuitively thought
that a sample of a few thousand could reveal such meaningful results. But I
understand your point completely. I suppose like you say that when you
factor in stratification and clustering, it isn't such a no brainer as in my
example.

Thank you again for enlightening me.



Donald Burrill [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 On Sun, 30 Sep 2001, John Jackson wrote:

  Here is my solution using figures which are self-explanatory:
 
  Sample Size Determination
 
  pi = 50%  central area
0.99
  confid level= 99% 2 tail area
0.5
  sampling error 2%  1 tail area 0.025
  z =2.58
  n1  4,146.82  Excel function for determining central interval
  NORMSINV($B$10+(1-$B$10)/2)
  n  4,147
 
  The algebraic formula for n was:
 
n = pi(1-pi)*(z/e)^2
 
 
  It is simply amazing to me that you can do a random sample of 4,147
  people out of 50 million and get a valid answer.

 It is not clear what part of this you find amazing.
 (Would you otherwise expect an INvalid answer, in some sense?)
 Thme hard part, of course, is taking the random sample in the first
 place.  The equation you used, I believe, assumes a simple random
 sample, sometimes known in the trade as a SRS;  but it seems to me
 VERY unlikely that any real sampling among the ballots cast in a
 national election would be done that way.  I'd expect it to involve
 stratifying on (e.g.) states, and possibly clustering within states;
 both of which would affect the precision of the estimate, and therefore
 the minimum sample size desired.
 As to what may be your concern, that 4,000 looks like a small
 part of 50 million, the precision of an estimate depends principally
 on the amount of information available -- that is, on the size of the
 sample;  not on the proportion that amount bears to the total amount
 of information that may be of interest.  Rather like a hologram, in
 some respects;  and very like the resolving power of an optical
 instrument (e.g., a telescope), which is a function of the amount of
 information the instrument can receive (the area of the primary lens
 or reflector), not on how far away the object in view may be nor what
 its absolute magnitude may be.

  What is the reason for taking multiple samples of the same n -
  to achieve more accuracy?

 I, for one, don't understand the point of this question at all.
 Multiple samples?  Who takes them, or advocates taking them?

  snip, the rest 

  
  Donald F. Burrill [EMAIL PROTECTED]
  184 Nashua Road, Bedford, NH 03110  603-471-7128



 =
 Instructions for joining and leaving this list and remarks about
 the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
 =




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: E as a % of a standard deviation

2001-09-30 Thread Rich Ulrich

On Sun, 30 Sep 2001 00:34:40 GMT, John Jackson
[EMAIL PROTECTED] wrote:

 Here is my solution using figures which are self-explanatory:
 
 Sample Size Determination
 
 pi = 50%  central area 0.99
 confid level= 99% 2 tail area 0.5
 sampling error 2%  1 tail area 0.025
 z =2.58
 n1  4,146.82  Excel function for determining central interval
 NORMSINV($B$10+(1-$B$10)/2)
 n  4,147
 
 The algebraic formula for n was:   n = ?(1-?)*(z/e)2
 
 
 
 If you can't read the above:
 
   n = pi(1-pi)*(z/e)^2
 
   Let me know if this makes sense.
 
 
 
 It is simply amazing to me that you can do a random sample of 4,147 people
 out of 50 million and get a valid answer. What is the reason for taking
 mulitple samples of the same n - to achieve more accuracy?  Is there a rule
 of thumb on how many repetitions of the same sample you would take?
 
I have not followed your steps in detail, but:
I think you just took a random sample to show that the number of 
ballots left blank, intentionally, is 1%, plus or minus 2 points.
That is using a crude, generous estimate of the variance instead
of conditioning on the small p.

 - A three-fold estimate (over the mean) for the maximum
is not good accuracy.
 - When the minimum estimate of p goes negative, it is time 
to try an estimation based on something different.

If I want an accurate estimate of a rare percentage, I often
find it easier to think of the number-of-instances.  One
percent of 4000 is 40.  What is the accuracy with 40 seen
in the sample?  (95% CI is  wider than 30 to 50, but not by
a whole lot.)


-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Help with Minitab Problem?

2001-09-30 Thread Donald Burrill

Turns out the method I originally suggested is unnecessarily cumbersome. 
A more elegant method is described below.

On Sat, 29 Sep 2001, Donald Burrill wrote in part:

   COPY c1-c35 to c41-c75;   #  Always retain the original data
   OMIT c1 = '*';
   OMIT c2 = '*';
   . . . ;
   OMIT c35 = '*'.
 
 There is probably a limit on the number of subcommands that MINITAB 
 can handle (or on the number of OMIT subcommands that COPY can handle), 
 but I don't know offhand what it is.  

Well, the limit is one:  only one OMIT subcommand per COPY command. 
That makes this procedure distinctly tedious, for 35 columns.

A more efficient method:
ADD c1-c35 c36
 This puts the sum of c1-c35 in c36, but if any one (or more) of c1-c35 
are missing, the result is missing:  so c36 has '*' for every row where 
there is a missing datum in some column(s).  A reasonable next step is
to see how much data is left:
N c36
 reports the number of non-missing values in c36.  If that value is zero, 
or some other very small number, you might want to re-think your 
strategy before proceeding:
COPY c1-c35 c41-c75;
OMIT c36 '*'.
 Columns c41-c75 now contain only rows of the original c1-c35 for which 
all of the values are NON-missing.

 snip, the rest 
-- DFB.
 
 Donald F. Burrill [EMAIL PROTECTED]
 184 Nashua Road, Bedford, NH 03110  603-471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=