On Sat, 12 May 2001, Alexandre Kaoukhov (RD <[EMAIL PROTECTED]>) wrote:
> I am puzzled with the following question:
> In z test for continuous variables we just use the sum of estimated
> variances to calculate the variance of a difference of two means i.e.
> s^2 = s1^2/n1 + s2^2/n2.
Not always. If homogeneity of variances is assumed (which may
well be consistent with the null hypothesis that the two means are
equal), the variance of the difference is calculated as a pooled
variance estimate multiplied by (1/n1 + 1/n2). If the two variances
are allowed to differ, the formula you cite is used (and one then has
the so-called Behrens-Fisher problem).
> For percentages we proceed as follows:
> s^2 = p(1-p)(1/n1 + 1/n2)
> where p = (n1*p1 + n2*p2)/(n1 + n2)
> Why do not we use:
> s^2 = p1(1-p1)/n1 + p2(1-p2)/n2
Because the distribution you need for testing an hypothesis is the
sampling distribution of the statistic in question (in this case, of
[p1 - p2]) under the hypothesis being tested (aka the null, or
model-distributional, hypothesis). If, as is usual, the null hypothesis
states that p1 = p2 in the population (that is, that p1 - p2 = 0), then
the best estimate available for the true proportion is p as defined
above (a weighted average of p1 and p2), and the variance of the
sampling distribution is the value you report.
In the more general case of testing the difference between means,
the population mean and the population variance are mutually independent,
so one can impose whatever restrictions may seem useful on the one
without affecting the other. But for proportions, the underlying
distribution is binomial; here the (population) variance is an explicit
function of the (population) mean, and if you're treating the mean as p
you must logically treat the variance as p(1-p)/n.
The value you propose is, of course, the variance of the sampling
distribution if the particular alternative hypothesis is true, that the
proportion in population 1 is p1 and the proportion in population 2 is p2,
with p1 not equal to p2. But of course _this_ sampling distribution is
irrelevant to the hypothesis test you are performing.
> For me first approach looks more like t test.
I presume you mean, in the sense that a pooled variance estimate
is used. Is this a problem, for some reason?
> On the other hand the chi2 is derived from Z^2 as assumed by first
> approach.
Sorry; the relevance of this comment eludes me.
> Finally, I would like to know whether the second formula is ever used
> and if so does it have any name.
"Ever" is a wider universe of discourse than I would dare pretend to.
Perhaps colleagues on the list may know of applications.
I would be surprised if it had been named, though.
> Thank you,
> Alexandre Kaoukhov
You're welcome. I hope it will have been helpful.
-- DFB.
------------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED]
MSC #29, Plymouth, NH 03264 603-535-2597
184 Nashua Road, Bedford, NH 03110 603-472-3742
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================