Hi Will, I think you meant to say that you are writing a study design paper presenting results of simulations and power analysis to determine appropriate sample sizes for multivariate analyses in geometric morphometrics. But I would think that would have already been settled by now, and possibly would be more relevant for certain clustering methods. The only parameterized PCA variant I am aware of is Kernel PCA, which is a nonlinear PCA method used for pattern analysis (e.g. used in image analysis), but that is not often employed in biological geometric morphometrics papers (at least, those that I frequently come across). When kernels are used they usually are meant to estimate densities of reduced-dimensionality data like CS, or PCs as shape variables.
Best, Justin Justin C. Bagley, Ph.D. Postdoctoral Scholar Plant Evolutionary Genomics Laboratory Department of Biology Virginia Commonwealth University Richmond, VA 23284-2012 jcbag...@vcu.edu Senior/Postdoctoral Research Associate Departamento de Zoologia Universidade de Brasília Campus Universitário Darcy Ribeiro 70910-900 Brasília, DF, Brasil Website: http://www.justinbagley.org Lattes CV: http://lattes.cnpq.br/0028570120872581 On Wed, May 31, 2017 at 6:41 PM, William Gelnaw <wgel...@gmail.com> wrote: > I'm currently working on a paper that deals with the problem of > over-parameterizing PCA in morphometrics. The recommendations that I'm > making in the paper are that you should try to have at least 3 times as > many samples as variables. That means that if you have 10 2D landmarks, > you should have at least 60 specimens that you measure. Based on > simulations, if you have fewer than 3 specimens per variable, you quickly > start getting eigenvalues for a PCA that are very different from known true > eigenvalues. I did a literature survey and about a quarter of > morphometrics studies in the last decade haven't met that standard. A good > way to test if you have enough samples is to do a jackknife analysis. If > you cut out about 10% of your observations and still get the same > eigenvalues, then your results are probably stable. > I hope this helps. > - Will > > On Wed, May 31, 2017 at 1:31 PM, mitte...@univie.ac.at < > mitte...@univie.ac.at> wrote: > >> Adding more (semi)landmarks inevitably increases the spatial resolution >> and thus allows one to capture finer anatomical details - whether relevant >> to the biological question or not. This can be advantageous for the >> reconstruction of shapes, especially when producing 3D morphs by warping >> dense surface representations. Basic developmental or evolutionary trends, >> group structures, etc., often are visible in an ordination analysis with a >> smaller set of relevant landmarks; finer anatomical resolution not >> necessarily affects these patterns. However, adding more landmarks cannot >> reduce or even remove any signals that were found with less landmarks, but >> it can make ordination analyses and the interpretation distances and angles >> in shape space more challenging. >> >> An excess of variables (landmarks) over specimens does NOT pose problems >> to statistical methods such as the computation of mean shapes and >> Procrustes distances, PCA, PLS, and the multivariate regression of shape >> coordinates on some independent variable (shape regression). These methods >> are based on averages or regressions computed for each variable separately, >> or on the decomposition of a covariance matrix. >> >> Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and >> relative eigenanalysis require the inversions of a full-rank covariance >> matrix, which implies an access of specimens over variables. The same >> applies to many multivariate parametric test statistics, such as >> Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full >> rank and thus can never be subjected to any of these methods without prior >> variable reduction. In fact, reliable results can only be obtained if there >> are manifold more specimens than variables, which usually requires variable >> reduction by PCA, PLS or other techniques, or the regularization of >> covariance matrices (which is more common in the bioinformatic community). >> >> For these reasons, I do not see any disadvantage of measuring a large >> number of landmarks, except for a waste of time perhaps. If life time is an >> issue, one can optimize landmark schemes as suggested by Jim or Aki. >> >> Best, >> >> Philipp >> >> -- >> MORPHMET may be accessed via its webpage at http://www.morphometrics.org >> --- >> You received this message because you are subscribed to the Google Groups >> "MORPHMET" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to morphmet+unsubscr...@morphometrics.org. >> > > -- > MORPHMET may be accessed via its webpage at http://www.morphometrics.org > --- > You received this message because you are subscribed to the Google Groups > "MORPHMET" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to morphmet+unsubscr...@morphometrics.org. > -- MORPHMET may be accessed via its webpage at http://www.morphometrics.org --- You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to morphmet+unsubscr...@morphometrics.org.