Will,
I’m not quite sure what over-parameterizing means in the case of PCA, as it is
simply a rigid-rotation of the dataspace and does not provide parameters for
statistical inference.
As for the distribution of eigenvalues, this of course is based on the
underlying covariance matrix for the traits, which in turn will be affected by
sample size. However, when traits become even mildly correlated (as is
certainly the case for landmark coordinates), the distribution of eigenvalues
of the covariance matrix becomes much better behaved. Specifically, the
eigenvalues associated with low and high PC axes are less extreme than is
observed with uncorrelated traits. That implies greater stability in their
estimation, as the covariance matrix is further from singular (see the large
statistical literature on the condition of a covariance matrix and subsequent
estimation issues for ill-behaved covariance matrices).
Best,
Dean
Dr. Dean C. Adams
Professor
Department of Ecology, Evolution, and Organismal Biology
Department of Statistics
Iowa State University
www.public.iastate.edu/~dcadams/<http://www.public.iastate.edu/~dcadams/>
phone: 515-294-3834
From: William Gelnaw [mailto:[email protected]]
Sent: Wednesday, May 31, 2017 5:41 PM
To: [email protected]
Cc: MORPHMET <[email protected]>
Subject: Re: [MORPHMET] Re: number of landmarks and sample size
I'm currently working on a paper that deals with the problem of
over-parameterizing PCA in morphometrics. The recommendations that I'm making
in the paper are that you should try to have at least 3 times as many samples
as variables. That means that if you have 10 2D landmarks, you should have at
least 60 specimens that you measure. Based on simulations, if you have fewer
than 3 specimens per variable, you quickly start getting eigenvalues for a PCA
that are very different from known true eigenvalues. I did a literature survey
and about a quarter of morphometrics studies in the last decade haven't met
that standard. A good way to test if you have enough samples is to do a
jackknife analysis. If you cut out about 10% of your observations and still
get the same eigenvalues, then your results are probably stable.
I hope this helps.
- Will
On Wed, May 31, 2017 at 1:31 PM,
[email protected]<mailto:[email protected]>
<[email protected]<mailto:[email protected]>> wrote:
Adding more (semi)landmarks inevitably increases the spatial resolution and
thus allows one to capture finer anatomical details - whether relevant to the
biological question or not. This can be advantageous for the reconstruction of
shapes, especially when producing 3D morphs by warping dense surface
representations. Basic developmental or evolutionary trends, group structures,
etc., often are visible in an ordination analysis with a smaller set of
relevant landmarks; finer anatomical resolution not necessarily affects these
patterns. However, adding more landmarks cannot reduce or even remove any
signals that were found with less landmarks, but it can make ordination
analyses and the interpretation distances and angles in shape space more
challenging.
An excess of variables (landmarks) over specimens does NOT pose problems to
statistical methods such as the computation of mean shapes and Procrustes
distances, PCA, PLS, and the multivariate regression of shape coordinates on
some independent variable (shape regression). These methods are based on
averages or regressions computed for each variable separately, or on the
decomposition of a covariance matrix.
Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and relative
eigenanalysis require the inversions of a full-rank covariance matrix, which
implies an access of specimens over variables. The same applies to many
multivariate parametric test statistics, such as Hotelling's T2, Wilks' Lambda,
etc. But shape coordinates are NEVER of full rank and thus can never be
subjected to any of these methods without prior variable reduction. In fact,
reliable results can only be obtained if there are manifold more specimens than
variables, which usually requires variable reduction by PCA, PLS or other
techniques, or the regularization of covariance matrices (which is more common
in the bioinformatic community).
For these reasons, I do not see any disadvantage of measuring a large number of
landmarks, except for a waste of time perhaps. If life time is an issue, one
can optimize landmark schemes as suggested by Jim or Aki.
Best,
Philipp
--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
You received this message because you are subscribed to the Google Groups
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
[email protected]<mailto:[email protected]>.
--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
You received this message because you are subscribed to the Google Groups
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
[email protected]<mailto:[email protected]>.
--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
You received this message because you are subscribed to the Google Groups
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].