Re: Normality & parametric tests (WAS: Kruskal-Wallis & equal va

2000-03-24 Thread Bruce Weaver

On Fri, 24 Mar 2000, Bernard Higgins wrote:

> 
> 
> Hi Bruce

Hello Bernard.

> 
> The point I was making is that when developing hypothesis tests, 
> from a theoretical point of view, the sampling distribution of the 
> test statistic from which critical values or p-values etc are 
> obtained, is determined by the null hypothesis. We need a probability 
> model to enable use to determine how likely observed patterns are. 
> These probability models will often work well in practice even if we 
> relax the usual assumptions. When using distribution-free tests as 
> an alternative to a parametric test we may need to specify 
> restrictions in order that the tests can be considered "equivalent". 

Agreed.

> 
> In my view the t-test is fairly robust and will work well in most 
> situations where the distribution is not too skewed, and constant 
> variance is reasonable. Indeed I have no problems in using it for the 
> majority of problems. When comparing two independent samples using 
> t-tests, lack of normality and constant variance are often not too 
> serious if the samples are of similar size, always a good idea in 
> planned experiments.

Agreed here too.

> 
> As you say, when samples are fairly large, some say 30+ or even 
> less, the sampling distribution of the mean can often be approximated 
> by a normal distribution (Central Limit Theorem) and hence the use of 
> an (asymptotic) Z-test is frequently used. It would not, I think, be 
> strictly correct to call such a statistic t, although from a 
> practical point of view there may be little difference. The formal 
> definition of the single sample t-test is derived from the ratio of a 
> Standard Normal random variable to a Chi-squared random variable and 
> does, in theory, require independent observations from a normal 
> distribution.


I think we are no longer in complete agreement here.  I am not a 
mathematician, but for what it's worth, here is my understanding of t- 
and z-tests:

numerator = (statistic - parameter|H0)
denominator = SE(statistic)

test statistic = z if SE(statistic) is based on pop. SD
test statistic = t if SE(statistic) is based on sample SD

The most common 'statistics' in the numerator are Xbar and (Xbar1 - 
Xbar2); but others are certainly possible (e.g., for large-sample 
versions of rank-based tests).

An assumption of both tests is that the statistic in the numerator has a
sampling distribution that is normal.  This is where the CLT comes into
play:  It lays out the conditions under which the sampling distribution of
the statistic is approximately normal--and those conditions can vary
depending on what statistic you're talking about.  But having a normal
sampling distribution does not mean that we can or should use a critical
z-value rather than a critical t when the population variance is unknown
(which is what I thought you were suggesting).  

As you say, one can substitute critical z for critical t when n gets
larger, because the differences become negligible.  But nowadays, most of
us are using computer programs that give us more or less exact p-values
anyway, so this is less of an issue than it once was. 


Cheers,
Bruce
-- 
Bruce Weaver
[EMAIL PROTECTED]
http://www.angelfire.com/wv/bwhomedir/




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Normality & parametric tests (WAS: Kruskal-Wallis & equal va

2000-03-24 Thread Bernard Higgins



Hi Bruce

The point I was making is that when developing hypothesis tests, 
from a theoretical point of view, the sampling distribution of the 
test statistic from which critical values or p-values etc are 
obtained, is determined by the null hypothesis. We need a probability 
model to enable use to determine how likely observed patterns are. 
These probability models will often work well in practice even if we 
relax the usual assumptions. When using distribution-free tests as 
an alternative to a parametric test we may need to specify 
restrictions in order that the tests can be considered "equivalent". 

In my view the t-test is fairly robust and will work well in most 
situations where the distribution is not too skewed, and constant 
variance is reasonable. Indeed I have no problems in using it for the 
majority of problems. When comparing two independent samples using 
t-tests, lack of normality and constant variance are often not too 
serious if the samples are of similar size, always a good idea in 
planned experiments.

As you say, when samples are fairly large, some say 30+ or even 
less, the sampling distribution of the mean can often be approximated 
by a normal distribution (Central Limit Theorem) and hence the use of 
an (asymptotic) Z-test is frequently used. It would not, I think, be 
strictly correct to call such a statistic t, although from a 
practical point of view there may be little difference. The formal 
definition of the single sample t-test is derived from the ratio of a 
Standard Normal random variable to a Chi-squared random variable and 
does, in theory, require independent observations from a normal 
distribution.


Regards - Bernie



> On 24 Mar 2000, Bernard Higgins wrote:
> 
> > These are my thoughts:
> > 
> > The sampling distribution of a test statistic is determined by the
> > null hypothesis. So analysis of variance is used to test that a
> > number of samples come from an identical Normal distribution
> > against the alternative that the "subpopulations" have different
> > means (but the same variances). The mean and standard deviation of
> > normally distributed random variables are independent of one
> > another.
> > 
> > Distribution free (non-parametric) procedures do not require the
> > underlying distribution to be normal. For the majority of these
> -- >8 ---


Bruce replied:


> 
> I think it is overly restrictive to say that the samples must come
> from normally distributed populations under a true null hypothesis. 
> Take the simplest paramtric test, a single sample t-test.  The
> assumption is that the sampling distribution of X-bar is
> (approximately) normal, not that the population from which you've
> sampled is normal.  If the population is normal, then of course the
> sampling distribution of X-bar will be too, for any size sample
> (even n=1).  But if your sample size is large enough (e.g., some
> authors suggest around n=300), the sampling distribution of X-bar
> will be close to normal no matter what the population distribution
> looks like. For populations that are not normal, but are reasonably
> symmetrical, the sampling distribution of X-bar will be near enough
> to normal with samples somewhere between these extremes.
  
  ---  
  
  Bernie Higgins
  Division of Mathematics and Statistics
  University of Portsmouth
  Mercantile House
  Hampshire Terrace
  Portsmouth PO1 2EG
   
  Tel: 01705 843031
   
  Fax: 01705 843106

  Email: [EMAIL PROTECTED]

 
  ---
 
  
 


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===