Dr. Hammer, Please consider your courage credited. -ds

A couple of points about PCA in general:

1) PCA makes no assumptions about the distribution (multivariate normal
or otherwise) of your data. It is a procedure that simply produces the
linear combinations of variables with maximum variance subject to
orthogonality to other such axes. Distribution assumptions only come
into play for (some) significance testing procedures.

2) PC1 will only identify size variation if size variation is the source
of the greatest variation in your sample. Sex, species, habitat, etc.
could all be determinants (not in the matrix sense 8-) ) of PC1 or some
combination of these.

In general, if you have data with some extreme outlier (e.g,
transcription error), then the PC1 will (probably) just point to (or pi
radians away from) the direction of that outlier relative to the main
sample, which will still be the linear combination of maximum variance.

What people often want PCA to do is either a) identify iso/allometry
due to size variation in a sample or b) separate out sexes, species, or
other groups. PCA is optimal for neither of these and could be quite
misleading in both cases.

If you are interested in size relationships, regress variables on some
meaningful measure of size. If you are interested in group differences,
look into CVA. 

If you have many more variables than specimens, you might do either of
the above in a reduced PCA space if you check carefully to see if your
limited data suggest you are capturing salient aspects of a space of
reduced dimension resulting from the tight correlations amongst your
variables. Otherwise, you must wave your hands vigorously before
proceeding.

See Marcus 1990 Blue Book chapter for a nice discussion of PCA and
related methods. 

Books by Jackson and Joliffe and other authors specifically on Principal Components 
are available.

-ds


On Wed, 2004-05-19 at 09:29, [EMAIL PROTECTED] wrote:
> Just a comment on this one, from a pragmatic point of view.
> 
> It is of course true that PCA is only *guaranteed* to
> produce components maximizing variance if you have
> multivariate normality. The theory of PCA is based on this
> assumption. But in many cases, PCA is used purely as a
> visualization device, projecting a multivariate data set
> onto a sheet of paper so we can see it. For visualization
> of non-normal data, one could play around with different
> techniques, such as PCA, PCO, NMDS, projection pursuit etc.,
> and then find that PCA does (or does not) perform well
> for the given data set. There is no law against making
> any linear combination you want of your variates, if it
> reveals information. For example, PCA may be perfectly
> adequate for resolving two well-separated groups, if
> the within-group variance is relatively small.
> 
> Of course, when using PCA for non-normal data one must
> be a little careful and not over-interpret the results
> (especially not the component loadings), but I think
> it's too harsh to dismiss its use totally.
> 
> I'm sure the hard-liners will flame me to pieces for
> this email, but I hope they will at least give me
> credit for my courage  :-)
> 
> 
> Dr. Oyvind Hammer
> Geological Museum
> University of Oslo
> 
> 
> 
> > PCA Analysis assumes multivariate normality.
> >
> > Kathleen M. Robinette, Ph.D.
> > Principal Research Anthropologist
> > Air Force Research Laboratory
> 
> 
> 
> ==
> Replies will be sent to list.
> For more information see http://life.bio.sunysb.edu/morph/morphmet.html.
-- 
Dennis E. Slice, Ph.D.
Department of Biomedical Engineering
Division of Radiologic Sciences
Wake Forest University School of Medicine
Winston-Salem, North Carolina, USA 
27157-1022
Phone: 336-716-5384
Fax: 336-716-2870



==
Replies will be sent to list.
For more information see http://life.bio.sunysb.edu/morph/morphmet.html.

Reply via email to