Re: Variance in z test comparing purcenteges
- BUT, Robert, the equal N case is different from cases with unequal N - - or did I lose track of what the topic really is... - On 22 May 2001 06:52:27 -0700, [EMAIL PROTECTED] (Robert J. MacG. Dawson) wrote: and Rich Ulrich responded: Aren't we looking at the same contrast as the t-test with pooled and unpooled variance estimates? Then - Similar, but not identical. With the z-for-proportion we have the additional twist that the amount of extra power from te unpooled test is linked to the size of the effect we're trying to measure, in such a way that we get it precisely when we don't need it. Or, to avoid being too pessimistic, let's say that the pooled test only costs us power when we can afford to lose some grin. - Robert wrote on May 18,And, clearly, the pooled variance is larger; as the function is convex up, the linear interpolation is always less. Back to my example in the previous post: Whenever you do a t-test, you get exactly the same t if the Ns are equal. For unequal N, you get a bigger t when the group with the smaller variance gets more weight. I think your z-tests on proportions have to work the same way. I can do a t-test with a dichotomous variable as the criterion, testing 1 of 100 versus 3 of 6: 2x2 table is (1+99), (3+3). That gives me a pooled t of 6 or 7, that is p .001; and a separate-variance t that is p= 0.06. - I like that pooled test, but I do think that it has stronger assumptions than the 2x2 table. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Variance in z test comparing purcenteges
Thanks to all who took time to answer my questions. I will try to make a thourough digest of it. There maybe some more questions to come. Alexandre Kaoukhov = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Variance in z test comparing purcenteges
On 18 May 2001 07:51:21 -0700, [EMAIL PROTECTED] (Robert J. MacG. Dawson) wrote: [ ... ] OK, so what *is* going on here? Checking a dozen or so sources, I found that indeed both versions are used fairly frequently (BTW, I myself use the pooled version, and the last few textbooks I've used do so). Then I did what I should have done years ago, and I tried a MINITAB simulation. I saw that for (say) n1=n2=10, p1=p2=0.5, the unpooled statistic tends to have a somewhat heavy-tailed distribution. This makes sense: when the sample sizes are small the pooled variance estimator is computed using a sample size for which the normal approximation works better. The advantage of the unpooled statistic is presumably higher power; hoewever, in most cases, this is illusory. When p1 and p2 are close together, you do not *get* much extra power. When they are far apart and have moderate sample sizes you don't *need* extra power. And when [ snip, rest] Aren't we looking at the same contrast as the t-test with pooled and unpooled variance estimates? Then - (a) there is exactly the same t-test value when the Ns are equal; the only change is in DF. (b) Which test is more powerful depends on which group is larger, the one with *small* variance, or the one with *large* variance. -- it is a large difference when Ns and variances are both different by (say) a fourfold factor or more. If the big N has the small variance, then the advantage lies with 'pooling' so that the wild, small group is not weighted as heavily. If the big N has the large variance, then the separate-variance estimate lets you take advantage of the precision of the smaller group. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Variance in z test comparing purcenteges
Great explanation! Thank you! Just need to clarify a few points below. On 14 May 2001 06:08:14 -0700, [EMAIL PROTECTED] (Robert J. MacG. Dawson) wrote: RD wrote: Hi, I am puzzled with the following question: In z test for continuous variables we just use the sum of estimated variances to calculate the variance of a difference of two means ie s^2=s1^2/n1+s2^2/n2. For purcentages we proceed as follows: s^2=p(1-p)(1/n1+1/n2) where p=(n1*p1+n2*p2)/(n1+n2) Why do not we use: s^2=p1(1-p1)/n1+p2(1-p2)/n2 For me first approach looks more like t test. On the other hand the chi2 is derived from Z^2 as assumed by first approach. This is an interesting question that occurred to me only after some years of teaching stats. The answer is: We *could* use it, but the test would have less power. A hypothesis test is essentially a probabilistic proof by contradiction; we are allowed to use the null hypothesis as often as we like with the intention of looking for trouble. However, we may be selective about how we use the null - we have to give it enough rope to shoot itself in the foot. Not every conclusion drawn from the null hypothesis is easy to falsify. If the null really *is* true, the two formulae are both estimators for the same thing. However, if the null is *false*, the pooled-p formula does not give the nearer-to-0.5 sample proportion as much chance to raise the estimated variance. The smaller estimated variance results in a larger test statistic and a smaller p-value. Another way to look at it is this: the pooled-p formula makes additional use of the null hypothesis, so it is intuitively plausible that it is more likely to yield a contradiction when the null is false. I tried to test this. In order to test this I tried to plot 3d the difference between variances function of two proportion. Here is my Maple formula: plot3d((10*x+10*y)/20*(1-(10*x+10*y)/20)*(1/10+1/10)-(x*(1-x)/10+y*(1-y)/10),x = 0 .. 1,y = 0 .. 1); where x=p1, y=p2, n1=n2=10 maybe I have just entered data erroneously but as I can see from plot exactly the reverse is true. Pooled-p variaance is always bigger exept for x=y. Why we can't do it for a t test: the null for a t test does *not* tell us anything about the variance so we have to estimate it separately. What about z-test? Donald in previous post suggested that both pooled and sum estimate of variance may be used. Finally, I would like to know whether the second formula is ever used and if so does it have any name. Yes, it is the standard formula used for variance when computing a Z confidence interval for proportions. When you are computing a CI you are *not* assuming, even for the sake of contradiction, that the proportions are the same; thus you must not use the pooled-p formula. You mean here CI of p1-p2? Thank you, Alexandre Kaoukhov = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =