Will,

I’m not quite sure what over-parameterizing means in the case of PCA, as it is 
simply a rigid-rotation of the dataspace and does not provide parameters for 
statistical inference.

As for the distribution of eigenvalues, this of course is based on the 
underlying covariance matrix for the traits, which in turn will be affected by 
sample size. However, when traits become even mildly correlated (as is 
certainly the case for landmark coordinates), the distribution of eigenvalues 
of the covariance matrix becomes much better behaved. Specifically, the 
eigenvalues associated with low and high PC axes are less extreme than is 
observed with uncorrelated traits. That implies greater stability in their 
estimation, as the covariance matrix is further from singular (see the large 
statistical literature on the condition of a covariance matrix and subsequent 
estimation issues for ill-behaved covariance matrices).

Best,

Dean


Dr. Dean C. Adams
Professor
Department of Ecology, Evolution, and Organismal Biology
       Department of Statistics
Iowa State University
www.public.iastate.edu/~dcadams/<http://www.public.iastate.edu/~dcadams/>
phone: 515-294-3834

From: William Gelnaw [mailto:wgel...@gmail.com]
Sent: Wednesday, May 31, 2017 5:41 PM
To: mitte...@univie.ac.at
Cc: MORPHMET <morphmet@morphometrics.org>
Subject: Re: [MORPHMET] Re: number of landmarks and sample size

I'm currently working on a paper that deals with the problem of 
over-parameterizing PCA in morphometrics.  The recommendations that I'm making 
in the paper are that you should try to have at least 3 times as many samples 
as variables.  That means that if you have 10 2D landmarks, you should have at 
least 60 specimens that you measure.  Based on simulations, if you have fewer 
than 3 specimens per variable, you quickly start getting eigenvalues for a PCA 
that are very different from known true eigenvalues.  I did a literature survey 
and about a quarter of morphometrics studies in the last decade haven't met 
that standard.  A good way to test if you have enough samples is to do a 
jackknife analysis.  If you cut out about 10% of your observations and still 
get the same eigenvalues, then your results are probably stable.
  I hope this helps.
  - Will

On Wed, May 31, 2017 at 1:31 PM, 
mitte...@univie.ac.at<mailto:mitte...@univie.ac.at> 
<mitte...@univie.ac.at<mailto:mitte...@univie.ac.at>> wrote:
Adding more (semi)landmarks inevitably increases the spatial resolution and 
thus allows one to capture finer anatomical details - whether relevant to the 
biological question or not. This can be advantageous for the reconstruction of 
shapes, especially when producing 3D morphs by warping dense surface 
representations. Basic developmental or evolutionary trends, group structures, 
etc., often are visible in an ordination analysis with a smaller set of 
relevant landmarks; finer anatomical resolution not necessarily affects these 
patterns. However, adding more landmarks cannot reduce or even remove any 
signals that were found with less landmarks, but it can make ordination 
analyses and the interpretation distances and angles in shape space more 
challenging.

An excess of variables (landmarks) over specimens does NOT pose problems to 
statistical methods such as the computation of mean shapes and Procrustes 
distances, PCA, PLS, and the multivariate regression of shape coordinates on 
some independent variable (shape regression). These methods are based on 
averages or regressions computed for each variable separately, or on the 
decomposition of a covariance matrix.

Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and relative 
eigenanalysis require the inversions of a full-rank covariance matrix, which 
implies an access of specimens over variables. The same applies to many 
multivariate parametric test statistics, such as Hotelling's T2, Wilks' Lambda, 
etc. But shape coordinates are NEVER of full rank and thus can never be 
subjected to any of these methods without prior variable reduction. In fact, 
reliable results can only be obtained if there are manifold more specimens than 
variables, which usually requires variable reduction by PCA, PLS or other 
techniques, or the regularization of covariance matrices (which is more common 
in the bioinformatic community).

For these reasons, I do not see any disadvantage of measuring a large number of 
landmarks, except for a waste of time perhaps. If life time is an issue, one 
can optimize landmark schemes as suggested by Jim or Aki.

Best,

Philipp

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
morphmet+unsubscr...@morphometrics.org<mailto:morphmet+unsubscr...@morphometrics.org>.

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
morphmet+unsubscr...@morphometrics.org<mailto:morphmet+unsubscr...@morphometrics.org>.

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.

Reply via email to