Will, I’m not quite sure what over-parameterizing means in the case of PCA, as it is simply a rigid-rotation of the dataspace and does not provide parameters for statistical inference.
As for the distribution of eigenvalues, this of course is based on the underlying covariance matrix for the traits, which in turn will be affected by sample size. However, when traits become even mildly correlated (as is certainly the case for landmark coordinates), the distribution of eigenvalues of the covariance matrix becomes much better behaved. Specifically, the eigenvalues associated with low and high PC axes are less extreme than is observed with uncorrelated traits. That implies greater stability in their estimation, as the covariance matrix is further from singular (see the large statistical literature on the condition of a covariance matrix and subsequent estimation issues for ill-behaved covariance matrices). Best, Dean Dr. Dean C. Adams Professor Department of Ecology, Evolution, and Organismal Biology Department of Statistics Iowa State University www.public.iastate.edu/~dcadams/<http://www.public.iastate.edu/~dcadams/> phone: 515-294-3834 From: William Gelnaw [mailto:wgel...@gmail.com] Sent: Wednesday, May 31, 2017 5:41 PM To: mitte...@univie.ac.at Cc: MORPHMET <morphmet@morphometrics.org> Subject: Re: [MORPHMET] Re: number of landmarks and sample size I'm currently working on a paper that deals with the problem of over-parameterizing PCA in morphometrics. The recommendations that I'm making in the paper are that you should try to have at least 3 times as many samples as variables. That means that if you have 10 2D landmarks, you should have at least 60 specimens that you measure. Based on simulations, if you have fewer than 3 specimens per variable, you quickly start getting eigenvalues for a PCA that are very different from known true eigenvalues. I did a literature survey and about a quarter of morphometrics studies in the last decade haven't met that standard. A good way to test if you have enough samples is to do a jackknife analysis. If you cut out about 10% of your observations and still get the same eigenvalues, then your results are probably stable. I hope this helps. - Will On Wed, May 31, 2017 at 1:31 PM, mitte...@univie.ac.at<mailto:mitte...@univie.ac.at> <mitte...@univie.ac.at<mailto:mitte...@univie.ac.at>> wrote: Adding more (semi)landmarks inevitably increases the spatial resolution and thus allows one to capture finer anatomical details - whether relevant to the biological question or not. This can be advantageous for the reconstruction of shapes, especially when producing 3D morphs by warping dense surface representations. Basic developmental or evolutionary trends, group structures, etc., often are visible in an ordination analysis with a smaller set of relevant landmarks; finer anatomical resolution not necessarily affects these patterns. However, adding more landmarks cannot reduce or even remove any signals that were found with less landmarks, but it can make ordination analyses and the interpretation distances and angles in shape space more challenging. An excess of variables (landmarks) over specimens does NOT pose problems to statistical methods such as the computation of mean shapes and Procrustes distances, PCA, PLS, and the multivariate regression of shape coordinates on some independent variable (shape regression). These methods are based on averages or regressions computed for each variable separately, or on the decomposition of a covariance matrix. Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and relative eigenanalysis require the inversions of a full-rank covariance matrix, which implies an access of specimens over variables. The same applies to many multivariate parametric test statistics, such as Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full rank and thus can never be subjected to any of these methods without prior variable reduction. In fact, reliable results can only be obtained if there are manifold more specimens than variables, which usually requires variable reduction by PCA, PLS or other techniques, or the regularization of covariance matrices (which is more common in the bioinformatic community). For these reasons, I do not see any disadvantage of measuring a large number of landmarks, except for a waste of time perhaps. If life time is an issue, one can optimize landmark schemes as suggested by Jim or Aki. Best, Philipp -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org<mailto:morphmet+unsubscr...@morphometrics.org>. -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org<mailto:morphmet+unsubscr...@morphometrics.org>. -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.