yOn Sat, 12 May 2001, RD wrote, inter alia:

> The only approach to deal with z test for means that I have seen so 
> far was using  s^2 = s1^2/n1 + s2^2/n2 formula. 
> t test is always using pooled variance. 
        I think not _always_.  _Usually_, because (i) there is seldom 
a strong need to insist that the two [sub]population variances be 
different, (ii) the distribution of the t statistic is easier to find 
(no fractional numbers of degrees of freedom, e.g.), and (iii) the 
computations are easier.  But if one were concerned about (i), as for 
instance when the two sample variances are quite different, one might 
take the alternative approach.  (But see below.)

> Both  z test and percentages comparison test are using normal 
> distribution.  Thus, intuitively I was considering them as basically 
> the same with only difference in variance calculations.
> My problem is that using weighted p for one and not using pooled s^2 
> for another seemed inconsistent with that idea.
        This is where you begin to go astray.  In the z test for means, 
the sampling distribution of the sample means (or of their mean 
difference) is (at least approximately) normal with mean mu and standard 
deviation sigma;  and mu and sigma are mutually independent, either 
because that's true of normal distributions or because that tends to be 
true of empirical data (more or less regardless of the empirical 
distribution).  But in the case of proportions (or, equivalently, of 
percentages) the underlying distribution is binomial:  and the mean and 
standard deviation of a binomial distribution are NOT independent, being 
(for the simple count of the event in question) np and SQRT(np(1-p)), or 
(for the proportion) p and SQRT(p(1-p)/n).  The fact that for  n  large 
enough the binomial distribution may be well approximated by a normal 
distribution with the same mean and variance does not alter the fact that 
the true distribution IS binomial, and thus has this direct connection 
between mean and standard deviation.
        It follows that in an ordinary z-test (or t-test), one can make 
whatever assumption one finds useful, desirable, or convenient with 
respect to the variance of the difference, without affecting the truth 
value of the null hypothesis about the mean (or the difference in means, 
etc.).  But in dealing with proportions, if the null hypothesis specifies 
that P = a given value, that hypothesis ALSO specifies what the variance 
must be.  Hence a null hypothesis that P1 = P2, or equivalently that 
P1-P2 = 0, specifies that the variance of the observed difference must be 
based on the assumed common P in the population.  And the best estimate 
available for that common P is the usual "weighted P", as you put it.

> Now you are saying that pooled variance may be used in z test. 
        Sometimes, anyway.  Admittedly, the point is debatable:  if one 
is using a z test at all, one is implicitly claiming to know what the 
corresponding variances are, and if they're different, they're different. 
But if one is skeptical about the state of one's knowledge (as one 
probably ought to be, else why test an hypothesis about means at all?), 
one may suspect that one's knowledge of variances is imperfect in some 
degree.  Then if the variances in question are not very far apart, it may 
be desirable to average them in some way, such as the usual pooling (or 
equivalently weighting by numbers of degrees of freedom).  But this does 
not really change anything except the particular mechanics of finding an 
average variance.  Summing the two sampling variances of the respective 
means and taking the square root of the sum produces an averaged standard 
error of the mean difference.  Pooling the two variances to obtain an 
average variance, then multiplying by the sum (1/n1 + 1/n2) and taking 
the square root of that sum, produces another averaged standard error of 
the mean difference.  The two averages are unlikely to differ much 
(except in pathological circumstances, perhaps), so it's rather splitting 
hairs to argue which one is "proper".  (And there's always the question, 
"proper" for what purpose or circumstances?)

> When would you use pooled variance in z test instead of sum and vice 
> versa? 
        I wouldn't bother to prescribe.  If the separate variances were 
different enough to worry about, I'd probably want to use both a standard 
formula (pooled or sum, I don't care which) AND a test using the LARGER 
variance, to be able to assert (if it be true) that the null hypothesis 
can be rejected even under quite conservative assumptions.  I can imagine 
wanting also to use the SMALLER variance, so as to produce a range of 
standardized effect sizes that one might reasonably believe to cover the 
true effect size.

> What are we really testing:  just two means or whether those two 
> samples come from the same population? 
        Precisely.  What we are "really" testing, if we are testing at 
all, may very well differ from situation to situation.  Taking account 
of the idiosyncracies of different circumstances is what we ought to be 
concerned about, more than trying to establish a rote rule of thumb for 
every circumstance.
        (A "rule of thumb" is, formally, a convention.  Conventions are 
adopted for convenience -- a word that has the same Latin root.  The 
convenience in question is often the convenience of not having to think 
about the characteristics of a problem before one tries to attack it. 
Not entirely unrelated to this concept is a favorite comment of Heidi 
Kass of the University of Alberta regarding computer output from 
statistical programs:  "untouched by the human mind".)

> Could you give me any reference with a book dealing with pooled 
> variance in z test? 

No, I don't know of one.  In fact, I doubt there is any.
                                                        -- DFB.
 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,          [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264                                 603-535-2597
 184 Nashua Road, Bedford, NH 03110                          603-471-7128



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to