Re: Interpretation problem

Rich Ulrich Sun, 17 Aug 2003 17:15:49 -0700

 - I was inspired to carry out some Monte carlo, too.
I include some results, and also some SPSS code for 
whoever is interested.

On Mon, 11 Aug 2003 15:58:36 -0400, Ken Butler
<[EMAIL PROTECTED]> wrote:

> On Mon, 11 Aug 2003 11:13:12 -0400, Rich Ulrich <[EMAIL PROTECTED]>
> wrote:
[ snip, various, about t-tests with Ns and SDs unequal.]

me> >
> >That last comment is misleading, too.  It  implies 
> >that the *other*  t-test is not  (approximately) 
> >just as flawed.  Which it is.

KB > 
> If by "other t-test" you mean where the smaller sample has the larger
> of two notably unequal SDs, then I agree (the P-value will then tend
> to be too small).
> 
> But if you mean the Welch t-test (where equal variances are not
> assumed, and the P-value is an approximation based on a choice of df
> to make a t-distribution fit well), then my agreement is rather more
> qualified. There are two (or more) issues:
> 
> - robustness: what happens to P-values when the populations are not
> normal? The answer here is probably "bad things" whether you use the
> pooled-variance or Welch tests.
> 
> - type I error probabilities when the populations are normal. My
> understanding is that it's only the pooled test that suffers here,
> something sort-of confirmed by a few texts and confirmed rather more
> dramatically by a little simulation.
> 
> I generated random samples of sizes 30 and 6 from normal distributions
> with SDs 0.46 and 0.17 (to echo the figures quoted above) and means
> both 0. I first did 1000 runs using the pooled test, and found that:
> 
> I rejected at alpha=0.10  0.008 of the time
>                     0.05  0.003 of the time
>                     0.01  0.000 of the time
> 
> The P-values are way too big, and we are rejecting way too
> infrequently. This is what I expected.
> 
> Compare this with the results of 1000 runs of the Welch test:
[ snip, numbers showing that the Welch test did well.]

The situation with the smaller SD  for the smaller N  is what
is ideal for the Welch test -- in fact, in special circumstances
where you know variances will differ, it is proper to design
your study to insist on sampling smaller N for the smaller SD.

I decided to run the same experiment where I also switched
so that the small sample would have the large SD.   - I was
surprised to see how well the Welch test continued to perform;
the performance of the Students test was 'symmetric'  but not
quite as I expected, because in the revised run, it rejects far 
too OFTEN.

  alpha--  replicating KB, small N has small SD.
Welch's
   .05 rejected  .057,  .043  of the time (two trials)
   .01 rejected  .004,  .007 - a little small.
Students
   .05  rejected  .001,  .004  - small by a lot.
   .01  rejected  zero and zero.

  alpha-- NEW:  taking small N with Large SD
Welch's
   .05  rejected   .054,  .038  of the time
   .01  rejected   .009,  .007 
Students
   .05  rejected   .176,  .150 - too big
   .01  rejected   .084,  .073 - WAY  too big.

Okay -- I think I have to revise my long-time advice.
Here is the situation where the shapes are normal, but
variances are unequal, Ns unequal and it is clear that
the Welch version is the right test.
 - Do keep in mind that the "inference about the mean"
does not have all of its usual implications.  For instance,
we are usually safe if we assume that the group with the
larger mean will have the highest scores, and vice-versa;
however, 'different variances'  says that the group with the
larger variance is apt to have the extreme scores in
either direction.

I've been concerned with the cases of unequal SDs 
where a transformation (log, square-root)  is apt to 
be appropriate.  In those cases, what is best is to carry
out the transformation;  what is next-best is to not trust
either.  The results depend on which N  is associated
with the larger variance, and Welch's test does not excel.

  alpha-- testing on exp(normal(x)/2) .
Testing reported for the two directions / tails.
Welch's
   .05  rejected   .107   + .012  
   .01  rejected   .032   + .001 
Students
   .05  rejected   .032  + .053 
   .01  rejected   .000   + .014  .

 - Students does better in this example. -

The numbers support what some long-time advice,
that the t-test (either one) has far less robustness as
a one-tailed test; one tail tends to be too large, often
by twice as much as it should be.  Here's a second set.

  alpha-- less extreme, exp(normal(x)/4).
Welch's
   .05  rejected   .075 + .030  of the time
   .01  rejected   .019 + .002 
Students
   .05  rejected   .040 +  .047 
   .01  rejected   .002 +  .010

/*  SPSS code for  randomization. Runs on VMS at 6.2, Windows at 10.1.
title   check variances . EQUAL .
set seed=1000009 .
input program .
vector  Xs(30), Ys(6) .
/* .
loop sets= 1 to 1000.
loop id=1 to 30 .
/*  call to Normal()  is Normal(Standard Deviation); switch
/*  the SDs for Xs and Ys  for the other run.
compute         Xs(id)= normal(24).
end loop .
loop id= 1 to 6.
compute         Ys(id)= normal(10).
end loop .
compute N_1= 30 .
compute N_2= 6 .
end case .
end loop .
end file .
end input program .

/*      computations for two versions of t-test .
compute ave_1= mean(Xs1 to Xs30) .
compute ave_2= mean(Ys1 to Ys6) .
compute var_1= var(Xs1 to Xs30) .
compute var_2= var(Ys1 to Ys6) .

/*      for the regular t-test, pooled variance estimate.
compute         DF_stu= N_1 + N_2 -2 .
compute         SPool=( (N_1-1)*var_1 + (N_2-1)*var_2 ) / DF_stu .
compute         SD_dif=  SQRT( SPool*(1/N_1 + 1/N_2) ) .
compute         t_stu  = ( AVE_2 - AVE_1 ) / SD_dif .

/*      for the test with adjusted d.f., separate variance estimates .
/*  taken from the SPSS computations algorithms, 1978 .
compute         DENOM= var_1/N_1 + var_2/N_2 .
compute         Z_1 = ( var_1/N_1  / DENOM )**2  / (N_1-1) .
compute         Z_2 = ( var_2/N_2  / DENOM )**2  / (N_2-1) .
compute         DF_sep = 1. / (Z_1 + Z_2) .
compute         t_sep = ( AVE_2 - AVE_1 ) / SQRT(DENOM) .

compute pc_stu= cdf.t(t_stu,DF_stu) .
compute pc_sep= cdf.t(t_sep,DF_sep) .

recode  pc_stu pc_sep(.999 thru hi= 99.9)(.99 thru .999=99)
        (.95 thru .99= 95)(.80 thru .95=80)(.50 thru .80=50)
        (.20 thru .50=49)(.05 thru .20= 20)(.01 thru .05=5)
        (.001 thru .01= 1)(lo thru .001= .1) .
formats var_1 var_2(F5.1) spool(F5.0) df_stu(F3.0) .
formats pc_stu, pc_sep(F6.1) .

/*  list        vars= ave_1 ave_2 var_1 var_2 t_stu t_sep pc_stu,
pc_sep .
subtitle        first 1000, done twice .
descriptives    ave_1 ave_2 var_1 var_2 t_stu t_sep pc_stu pc_sep .
frequencies     pc_stu pc_sep /format= onepage .

http://www.pitt.edu/~wpilib/index.html
"Taxes are the price we pay for civilization."  Justice Holmes.
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: Interpretation problem

Reply via email to