Re: Variance in z test comparing purcenteges

2001-05-23 Thread Rich Ulrich

 - BUT, Robert, 
the equal N case is different from cases with unequal N -
 - or did I lose track of what the topic really is... -

On 22 May 2001 06:52:27 -0700, [EMAIL PROTECTED] (Robert J.
MacG. Dawson) wrote:

 and Rich Ulrich responded: 
  Aren't we looking at the same contrast as the t-test with
  pooled and unpooled variance estimates?  Then -
 
 Similar, but not identical. With the z-for-proportion we 
 have the additional twist that the amount of extra power
 from te unpooled test is linked to the size of the effect 
 we're trying to measure, in such a way that we get it 
 precisely when we don't need it. Or, to avoid being too 
 pessimistic, let's say that the pooled test only costs us 
 power when we can afford to lose some grin.
 

- Robert wrote on May 18,And, clearly, the pooled 
variance is larger; as the function is convex up, the 
linear interpolation is always less.

Back to my example in the previous post:  Whenever you 
do a t-test, you get exactly the same t if the Ns are equal.
For unequal N, you get a bigger t when the group with the 
smaller variance gets more weight.  I think your z-tests
on proportions have to work the same way.

I can do a t-test with a dichotomous variable as the criterion, 
testing 1 of 100  versus 3 of 6:  2x2 table is (1+99), (3+3).
That gives me a pooled t of 6 or 7, that is  p  .001; and  a
separate-variance t  that is p= 0.06.

 - I like that pooled test, but I do think that it has stronger
assumptions than the 2x2 table.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Variance in z test comparing purcenteges

2001-05-21 Thread RD

Thanks to all who took time to answer my questions. I will try to make
a thourough digest of it. There maybe some more questions to come.
Alexandre Kaoukhov


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Variance in z test comparing purcenteges

2001-05-18 Thread Rich Ulrich

On 18 May 2001 07:51:21 -0700, [EMAIL PROTECTED] (Robert J.
MacG. Dawson) wrote:

 [ ... ] 
   OK, so what *is* going on here?  Checking a dozen or so sources, I
 found that indeed both versions are used fairly frequently (BTW, I
 myself use the pooled version, and the last few textbooks I've used do
 so).
 
   Then I did what I should have done years ago, and I tried a MINITAB
 simulation. I saw that for (say) n1=n2=10, p1=p2=0.5, the unpooled
 statistic tends to have a somewhat heavy-tailed distribution. This makes
 sense: when the sample sizes are small the pooled variance estimator is
 computed using a sample size for which the normal approximation works
 better.
 
   The advantage of the unpooled statistic is presumably higher power;
 hoewever, in most cases, this is illusory. When p1 and p2 are close
 together, you do not *get* much extra power.  When they are far apart
 and have moderate sample sizes you don't *need* extra power. And when
[ snip, rest]

Aren't we looking at the same contrast as the t-test with 
pooled and unpooled variance estimates?  Then -

(a) there is exactly the same  t-test value when the Ns are equal; 
the only change is in DF.
(b) Which test is more powerful depends on which group is 
larger, the one with *small*  variance, or the one with *large*
variance.   -- it is a large difference when Ns and variances
are both different by (say) a fourfold factor or more.

If the big N has the small variance, then the advantage
lies with 'pooling'  so that the wild, small group is not weighted
as heavily.  If the big N has the large variance, then the 
separate-variance estimate lets you take advantage of the
precision of the smaller group.  

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Variance in z test comparing purcenteges

2001-05-14 Thread RD

Great explanation! Thank you!
Just need to clarify a few points below.


On 14 May 2001 06:08:14 -0700, [EMAIL PROTECTED] (Robert J.
MacG. Dawson) wrote:



RD wrote:
 
 Hi,
 I am puzzled with the following question:
 In z test for continuous variables we just use the sum of estimated
 variances to calculate the variance of a difference of two means ie
 s^2=s1^2/n1+s2^2/n2.
 For purcentages we proceed as follows:
 s^2=p(1-p)(1/n1+1/n2)
 where p=(n1*p1+n2*p2)/(n1+n2)
 Why do not we use:
 s^2=p1(1-p1)/n1+p2(1-p2)/n2
 For me first approach looks more like  t test. On the other hand the
 chi2 is derived from Z^2 as assumed by first approach.

   This is an interesting question that occurred to me only
after some years of teaching stats. The answer is: 

   We *could* use it, but the test would have less power.

A hypothesis test is essentially a probabilistic 
proof by  contradiction; we are allowed to use 
the null hypothesis as often as we like with the 
intention of looking for trouble. However, we may 
be selective about how we use the null - we have 
to give it enough rope to shoot itself in the 
foot. Not every conclusion drawn from the null 
hypothesis is easy to falsify.

If the null really *is* true, the two formulae are both
estimators for the same thing.  However, if the null is
*false*, the pooled-p formula does not give the 
nearer-to-0.5 sample proportion as much chance to raise 
the estimated variance. The smaller estimated variance 
results in a larger test statistic and a smaller p-value.

Another way to look at it is this: the pooled-p formula
makes additional use of the null hypothesis, so it is
intuitively plausible that it is more likely to yield a
contradiction when the null is false. 

I tried to test this. In order to test this I tried to plot 3d the
difference between variances function of two proportion.
Here is my Maple formula:
plot3d((10*x+10*y)/20*(1-(10*x+10*y)/20)*(1/10+1/10)-(x*(1-x)/10+y*(1-y)/10),x
= 0 .. 1,y = 0 .. 1);
where x=p1, y=p2, n1=n2=10
maybe I have just entered data erroneously but as I can see from plot
exactly the reverse is true. Pooled-p variaance is always bigger exept
for x=y.

Why we can't do it for a t test: the null for a t test 
does *not* tell us anything about the variance so we
have to estimate it separately.

What about z-test? Donald in previous post suggested that both pooled
and sum estimate of variance may be used.


 Finally, I would like to know whether the second formula is ever used
 and if so does it have any name.

   Yes, it is the standard formula used for variance
when computing a Z confidence interval for proportions.
When you are computing a CI you are *not* assuming, even for
the sake of contradiction, that the proportions are
the same; thus you must not use the pooled-p formula.

 You mean here CI of p1-p2?

Thank you,
Alexandre Kaoukhov


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=