Re: Variance in z test comparing percenteges

Donald Burrill Fri, 11 May 2001 21:39:51 -0700
On Sat, 12 May 2001, Alexandre Kaoukhov (RD <[EMAIL PROTECTED]>) wrote:

> I am puzzled with the following question:
> In z test for continuous variables we just use the sum of estimated
> variances to calculate the variance of a difference of two means i.e.
>    s^2 = s1^2/n1 + s2^2/n2.
        Not always.  If homogeneity of variances is assumed (which may 
well be consistent with the null hypothesis that the two means are 
equal), the variance of the difference is calculated as a pooled 
variance estimate multiplied by (1/n1 + 1/n2).  If the two variances 
are allowed to differ, the formula you cite is used (and one then has 
the so-called Behrens-Fisher problem).

> For percentages we proceed as follows:
>    s^2 = p(1-p)(1/n1 + 1/n2)
> where p = (n1*p1 + n2*p2)/(n1 + n2)
> Why do not we use:
>    s^2 = p1(1-p1)/n1 + p2(1-p2)/n2

Because the distribution you need for testing an hypothesis is the 
sampling distribution of the statistic in question (in this case, of 
[p1 - p2]) under the hypothesis being tested (aka the null, or 
model-distributional, hypothesis).  If, as is usual, the null hypothesis 
states that p1 = p2 in the population (that is, that p1 - p2 = 0), then 
the best estimate available for the true proportion is  p  as defined 
above (a weighted average of p1 and p2), and the variance of the 
sampling distribution is the value you report.
        In the more general case of testing the difference between means, 
the population mean and the population variance are mutually independent, 
so one can impose whatever restrictions may seem useful on the one 
without affecting the other.  But for proportions, the underlying 
distribution is binomial;  here the (population) variance is an explicit 
function of the (population) mean, and if you're treating the mean as  p 
you must logically treat the variance as  p(1-p)/n.
        The value you propose is, of course, the variance of the sampling 
distribution if the particular alternative hypothesis is true, that the 
proportion in population 1 is p1 and the proportion in population 2 is p2, 
with p1 not equal to p2.  But of course _this_ sampling distribution is 
irrelevant to the hypothesis test you are performing.

> For me first approach looks more like  t  test. 
        I presume you mean, in the sense that a pooled variance estimate 
is used.  Is this a problem, for some reason?

> On the other hand the chi2 is derived from Z^2 as assumed by first 
> approach.
                Sorry;  the relevance of this comment eludes me.

> Finally, I would like to know whether the second formula is ever used
> and if so does it have any name.

"Ever" is a wider universe of discourse than I would dare pretend to. 
Perhaps colleagues on the list may know of applications.
I would be surprised if it had been named, though.

> Thank you,
> Alexandre Kaoukhov

You're welcome.  I hope it will have been helpful.
                                                        -- DFB.
 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,          [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264                                 603-535-2597
 184 Nashua Road, Bedford, NH 03110                          603-472-3742  



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================
Re: Variance in z test comparing percenteges

Reply via email to