- I was inspired to carry out some Monte carlo, too.
I include some results, and also some SPSS code for
whoever is interested.
On Mon, 11 Aug 2003 15:58:36 -0400, Ken Butler
<[EMAIL PROTECTED]> wrote:
> On Mon, 11 Aug 2003 11:13:12 -0400, Rich Ulrich <[EMAIL PROTECTED]>
> wrote:
[ snip, various, about t-tests with Ns and SDs unequal.]
me> >
> >That last comment is misleading, too. It implies
> >that the *other* t-test is not (approximately)
> >just as flawed. Which it is.
KB >
> If by "other t-test" you mean where the smaller sample has the larger
> of two notably unequal SDs, then I agree (the P-value will then tend
> to be too small).
>
> But if you mean the Welch t-test (where equal variances are not
> assumed, and the P-value is an approximation based on a choice of df
> to make a t-distribution fit well), then my agreement is rather more
> qualified. There are two (or more) issues:
>
> - robustness: what happens to P-values when the populations are not
> normal? The answer here is probably "bad things" whether you use the
> pooled-variance or Welch tests.
>
> - type I error probabilities when the populations are normal. My
> understanding is that it's only the pooled test that suffers here,
> something sort-of confirmed by a few texts and confirmed rather more
> dramatically by a little simulation.
>
> I generated random samples of sizes 30 and 6 from normal distributions
> with SDs 0.46 and 0.17 (to echo the figures quoted above) and means
> both 0. I first did 1000 runs using the pooled test, and found that:
>
> I rejected at alpha=0.10 0.008 of the time
> 0.05 0.003 of the time
> 0.01 0.000 of the time
>
> The P-values are way too big, and we are rejecting way too
> infrequently. This is what I expected.
>
> Compare this with the results of 1000 runs of the Welch test:
[ snip, numbers showing that the Welch test did well.]
The situation with the smaller SD for the smaller N is what
is ideal for the Welch test -- in fact, in special circumstances
where you know variances will differ, it is proper to design
your study to insist on sampling smaller N for the smaller SD.
I decided to run the same experiment where I also switched
so that the small sample would have the large SD. - I was
surprised to see how well the Welch test continued to perform;
the performance of the Students test was 'symmetric' but not
quite as I expected, because in the revised run, it rejects far
too OFTEN.
alpha-- replicating KB, small N has small SD.
Welch's
.05 rejected .057, .043 of the time (two trials)
.01 rejected .004, .007 - a little small.
Students
.05 rejected .001, .004 - small by a lot.
.01 rejected zero and zero.
alpha-- NEW: taking small N with Large SD
Welch's
.05 rejected .054, .038 of the time
.01 rejected .009, .007
Students
.05 rejected .176, .150 - too big
.01 rejected .084, .073 - WAY too big.
Okay -- I think I have to revise my long-time advice.
Here is the situation where the shapes are normal, but
variances are unequal, Ns unequal and it is clear that
the Welch version is the right test.
- Do keep in mind that the "inference about the mean"
does not have all of its usual implications. For instance,
we are usually safe if we assume that the group with the
larger mean will have the highest scores, and vice-versa;
however, 'different variances' says that the group with the
larger variance is apt to have the extreme scores in
either direction.
I've been concerned with the cases of unequal SDs
where a transformation (log, square-root) is apt to
be appropriate. In those cases, what is best is to carry
out the transformation; what is next-best is to not trust
either. The results depend on which N is associated
with the larger variance, and Welch's test does not excel.
alpha-- testing on exp(normal(x)/2) .
Testing reported for the two directions / tails.
Welch's
.05 rejected .107 + .012
.01 rejected .032 + .001
Students
.05 rejected .032 + .053
.01 rejected .000 + .014 .
- Students does better in this example. -
The numbers support what some long-time advice,
that the t-test (either one) has far less robustness as
a one-tailed test; one tail tends to be too large, often
by twice as much as it should be. Here's a second set.
alpha-- less extreme, exp(normal(x)/4).
Welch's
.05 rejected .075 + .030 of the time
.01 rejected .019 + .002
Students
.05 rejected .040 + .047
.01 rejected .002 + .010
/* SPSS code for randomization. Runs on VMS at 6.2, Windows at 10.1.
title check variances . EQUAL .
set seed=1000009 .
input program .
vector Xs(30), Ys(6) .
/* .
loop sets= 1 to 1000.
loop id=1 to 30 .
/* call to Normal() is Normal(Standard Deviation); switch
/* the SDs for Xs and Ys for the other run.
compute Xs(id)= normal(24).
end loop .
loop id= 1 to 6.
compute Ys(id)= normal(10).
end loop .
compute N_1= 30 .
compute N_2= 6 .
end case .
end loop .
end file .
end input program .
/* computations for two versions of t-test .
compute ave_1= mean(Xs1 to Xs30) .
compute ave_2= mean(Ys1 to Ys6) .
compute var_1= var(Xs1 to Xs30) .
compute var_2= var(Ys1 to Ys6) .
/* for the regular t-test, pooled variance estimate.
compute DF_stu= N_1 + N_2 -2 .
compute SPool=( (N_1-1)*var_1 + (N_2-1)*var_2 ) / DF_stu .
compute SD_dif= SQRT( SPool*(1/N_1 + 1/N_2) ) .
compute t_stu = ( AVE_2 - AVE_1 ) / SD_dif .
/* for the test with adjusted d.f., separate variance estimates .
/* taken from the SPSS computations algorithms, 1978 .
compute DENOM= var_1/N_1 + var_2/N_2 .
compute Z_1 = ( var_1/N_1 / DENOM )**2 / (N_1-1) .
compute Z_2 = ( var_2/N_2 / DENOM )**2 / (N_2-1) .
compute DF_sep = 1. / (Z_1 + Z_2) .
compute t_sep = ( AVE_2 - AVE_1 ) / SQRT(DENOM) .
compute pc_stu= cdf.t(t_stu,DF_stu) .
compute pc_sep= cdf.t(t_sep,DF_sep) .
recode pc_stu pc_sep(.999 thru hi= 99.9)(.99 thru .999=99)
(.95 thru .99= 95)(.80 thru .95=80)(.50 thru .80=50)
(.20 thru .50=49)(.05 thru .20= 20)(.01 thru .05=5)
(.001 thru .01= 1)(lo thru .001= .1) .
formats var_1 var_2(F5.1) spool(F5.0) df_stu(F3.0) .
formats pc_stu, pc_sep(F6.1) .
/* list vars= ave_1 ave_2 var_1 var_2 t_stu t_sep pc_stu,
pc_sep .
subtitle first 1000, done twice .
descriptives ave_1 ave_2 var_1 var_2 t_stu t_sep pc_stu pc_sep .
frequencies pc_stu pc_sep /format= onepage .
http://www.pitt.edu/~wpilib/index.html
"Taxes are the price we pay for civilization." Justice Holmes.
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================