Re: [MORPHMET] Re: number of landmarks and sample size

Justin Bagley Sun, 11 Jun 2017 08:36:59 -0700

Hi Will,

I think you meant to say that you are writing a study design paper
presenting results of simulations and power analysis to determine
appropriate sample sizes for multivariate analyses in geometric
morphometrics. But I would think that would have already been settled by
now, and possibly would be more relevant for certain clustering methods.
The only parameterized PCA variant I am aware of is Kernel PCA, which is a
nonlinear PCA method used for pattern analysis (e.g. used in image
analysis), but that is not often employed in biological geometric
morphometrics papers (at least, those that I frequently come across). When
kernels are used they usually are meant to estimate densities of
reduced-dimensionality data like CS, or PCs as shape variables.


Best,

Justin

Justin C. Bagley, Ph.D.
Postdoctoral Scholar
Plant Evolutionary Genomics Laboratory
Department of Biology
Virginia Commonwealth University
Richmond, VA 23284-2012
[email protected]

Senior/Postdoctoral Research Associate
Departamento de Zoologia
Universidade de Brasília
Campus Universitário Darcy Ribeiro
70910-900 Brasília, DF, Brasil
Website: http://www.justinbagley.org
Lattes CV: http://lattes.cnpq.br/0028570120872581

On Wed, May 31, 2017 at 6:41 PM, William Gelnaw <[email protected]> wrote:

> I'm currently working on a paper that deals with the problem of
> over-parameterizing PCA in morphometrics.  The recommendations that I'm
> making in the paper are that you should try to have at least 3 times as
> many samples as variables.  That means that if you have 10 2D landmarks,
> you should have at least 60 specimens that you measure.  Based on
> simulations, if you have fewer than 3 specimens per variable, you quickly
> start getting eigenvalues for a PCA that are very different from known true
> eigenvalues.  I did a literature survey and about a quarter of
> morphometrics studies in the last decade haven't met that standard.  A good
> way to test if you have enough samples is to do a jackknife analysis.  If
> you cut out about 10% of your observations and still get the same
> eigenvalues, then your results are probably stable.
>   I hope this helps.
>   - Will
>
> On Wed, May 31, 2017 at 1:31 PM, [email protected] <
> [email protected]> wrote:
>
>> Adding more (semi)landmarks inevitably increases the spatial resolution
>> and thus allows one to capture finer anatomical details - whether relevant
>> to the biological question or not. This can be advantageous for the
>> reconstruction of shapes, especially when producing 3D morphs by warping
>> dense surface representations. Basic developmental or evolutionary trends,
>> group structures, etc., often are visible in an ordination analysis with a
>> smaller set of relevant landmarks; finer anatomical resolution not
>> necessarily affects these patterns. However, adding more landmarks cannot
>> reduce or even remove any signals that were found with less landmarks, but
>> it can make ordination analyses and the interpretation distances and angles
>> in shape space more challenging.
>>
>> An excess of variables (landmarks) over specimens does NOT pose problems
>> to statistical methods such as the computation of mean shapes and
>> Procrustes distances, PCA, PLS, and the multivariate regression of shape
>> coordinates on some independent variable (shape regression). These methods
>> are based on averages or regressions computed for each variable separately,
>> or on the decomposition of a covariance matrix.
>>
>> Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and
>> relative eigenanalysis require the inversions of a full-rank covariance
>> matrix, which implies an access of specimens over variables. The same
>> applies to many multivariate parametric test statistics, such as
>> Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full
>> rank and thus can never be subjected to any of these methods without prior
>> variable reduction. In fact, reliable results can only be obtained if there
>> are manifold more specimens than variables, which usually requires variable
>> reduction by PCA, PLS or other techniques, or the regularization of
>> covariance matrices (which is more common in the bioinformatic community).
>>
>> For these reasons, I do not see any disadvantage of measuring a large
>> number of landmarks, except for a waste of time perhaps. If life time is an
>> issue, one can optimize landmark schemes as suggested by Jim or Aki.
>>
>> Best,
>>
>> Philipp
>>
>> --
>> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "MORPHMET" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>>
>
> --
> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
> ---
> You received this message because you are subscribed to the Google Groups
> "MORPHMET" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
>

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].

Re: [MORPHMET] Re: number of landmarks and sample size

Reply via email to