Re: [MORPHMET] Re: number of landmarks and sample size

2017-06-11 Thread Justin Bagley
Hi Will,

I think you meant to say that you are writing a study design paper
presenting results of simulations and power analysis to determine
appropriate sample sizes for multivariate analyses in geometric
morphometrics. But I would think that would have already been settled by
now, and possibly would be more relevant for certain clustering methods.
The only parameterized PCA variant I am aware of is Kernel PCA, which is a
nonlinear PCA method used for pattern analysis (e.g. used in image
analysis), but that is not often employed in biological geometric
morphometrics papers (at least, those that I frequently come across). When
kernels are used they usually are meant to estimate densities of
reduced-dimensionality data like CS, or PCs as shape variables.

Best,

Justin

Justin C. Bagley, Ph.D.
Postdoctoral Scholar
Plant Evolutionary Genomics Laboratory
Department of Biology
Virginia Commonwealth University
Richmond, VA 23284-2012
jcbag...@vcu.edu

Senior/Postdoctoral Research Associate
Departamento de Zoologia
Universidade de Brasília
Campus Universitário Darcy Ribeiro
70910-900 Brasília, DF, Brasil
Website: http://www.justinbagley.org
Lattes CV: http://lattes.cnpq.br/0028570120872581

On Wed, May 31, 2017 at 6:41 PM, William Gelnaw  wrote:

> I'm currently working on a paper that deals with the problem of
> over-parameterizing PCA in morphometrics.  The recommendations that I'm
> making in the paper are that you should try to have at least 3 times as
> many samples as variables.  That means that if you have 10 2D landmarks,
> you should have at least 60 specimens that you measure.  Based on
> simulations, if you have fewer than 3 specimens per variable, you quickly
> start getting eigenvalues for a PCA that are very different from known true
> eigenvalues.  I did a literature survey and about a quarter of
> morphometrics studies in the last decade haven't met that standard.  A good
> way to test if you have enough samples is to do a jackknife analysis.  If
> you cut out about 10% of your observations and still get the same
> eigenvalues, then your results are probably stable.
>   I hope this helps.
>   - Will
>
> On Wed, May 31, 2017 at 1:31 PM, mitte...@univie.ac.at <
> mitte...@univie.ac.at> wrote:
>
>> Adding more (semi)landmarks inevitably increases the spatial resolution
>> and thus allows one to capture finer anatomical details - whether relevant
>> to the biological question or not. This can be advantageous for the
>> reconstruction of shapes, especially when producing 3D morphs by warping
>> dense surface representations. Basic developmental or evolutionary trends,
>> group structures, etc., often are visible in an ordination analysis with a
>> smaller set of relevant landmarks; finer anatomical resolution not
>> necessarily affects these patterns. However, adding more landmarks cannot
>> reduce or even remove any signals that were found with less landmarks, but
>> it can make ordination analyses and the interpretation distances and angles
>> in shape space more challenging.
>>
>> An excess of variables (landmarks) over specimens does NOT pose problems
>> to statistical methods such as the computation of mean shapes and
>> Procrustes distances, PCA, PLS, and the multivariate regression of shape
>> coordinates on some independent variable (shape regression). These methods
>> are based on averages or regressions computed for each variable separately,
>> or on the decomposition of a covariance matrix.
>>
>> Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and
>> relative eigenanalysis require the inversions of a full-rank covariance
>> matrix, which implies an access of specimens over variables. The same
>> applies to many multivariate parametric test statistics, such as
>> Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full
>> rank and thus can never be subjected to any of these methods without prior
>> variable reduction. In fact, reliable results can only be obtained if there
>> are manifold more specimens than variables, which usually requires variable
>> reduction by PCA, PLS or other techniques, or the regularization of
>> covariance matrices (which is more common in the bioinformatic community).
>>
>> For these reasons, I do not see any disadvantage of measuring a large
>> number of landmarks, except for a waste of time perhaps. If life time is an
>> issue, one can optimize landmark schemes as suggested by Jim or Aki.
>>
>> Best,
>>
>> Philipp
>>
>> --
>> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "MORPHMET" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to morphmet+unsubscr...@morphometrics.org.
>>
>
> --
> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
> ---
> You received this message because you are subscribed to the Google Groups
> "MORPHMET" 

[MORPHMET] Re: number of landmarks and sample size

2017-06-03 Thread Mike Collyer
Ilker,

Philipp already defined well why - I think - this rationale is incorrect, if 
not dangerous, especially along the lines of statistical power.  As he 
indicated, using Procrustes residuals as data means a covariance matrix will 
never be full rank, owing to the invariance in size, orientation, and position 
of landmark configurations following GPA.  At most, the dimensions of the data 
space can be kp - g, where k is the number of landmark dimensions (2 or 3), p 
is the number of landmarks, and g is the number of invariant dimensions due to 
GPA (with or without sliding landmarks) or n - 1 if (kp - g) > n - 1.  As he 
also pointed out, increasing landmarks can increase the spatial resolution, 
meaning that if n - 1 is the limited number of dimensions, the distances 
between specimens can increase in the n - 1 dimensional space that results from 
increasing p.  If by “statistical power” one means an increased probability to 
reject a null hypothesis that population centroids (mean configurations) are 
the same, then increasing resolution should enhance one’s ability to reject a 
null hypothesis.

I think tying the dimensionality of the space where the hypothesis is tested to 
the number of landmarks precludes appreciating Philipp’s comment about spatial 
resolution.  

I do not wish to necessarily advocate using a limited number of PCs as shape 
data, as a rule, but one can appreciate that given a choice between two 
configurations - one with seven fixed 2D landmarks (10 PCs after GPA) and one 
with the first 10 PCs obtained from configurations with hundreds of landmarks - 
the separation of groups in the latter case might be more prominent than in the 
former, hence increasing statistical power.

Whether hundreds of landmarks are needed, or 50, or 20, or 10, or even only 7, 
or whether increasing statistical power is important, is a question that must 
be answered case by case with empirical results.  However, placing an a priori 
limit on the number of landmarks one can define because of the size of samples 
one can collect is certain way to limit statistical power, especially when 
small samples are all that’s available.

Cheers!
Mike



> On Jun 3, 2017, at 5:31 AM, Ilker ERCAN <ier...@msn.com> wrote:
> 
> when we perform multivariate analysis, It must be n>p otherwise determinant 
> of Generalized variance equals to zero therefore it must be 2*l 3*l Best wishes
> Ilker ERCAN
> 
> 
> Gönderen: Norman MacLeod <n.macl...@nhm.ac.uk <mailto:n.macl...@nhm.ac.uk>>
> Gönderildi: 3 Haziran 2017 Cumartesi 11:18
> Kime: MORPHMET
> Konu: Re: [MORPHMET] Re: number of landmarks and sample size
>  
> In discussions like these it would be helpful if the writer could clarify 
> whether they are referring to the concepts of biological homology, 
> topological homology or "semantic homology". These aren't the same things and 
> the whole issue of “homology” in geometric morphometrics has always seemed, 
> at least to me, to be very confused. For example, refer to the definitions of 
> “homology” and “landmark” in the Glossary on the SB Morphometrics web site. 
> Because it means different things to different specialists homology isn't a 
> term to be thrown around as lightly as morphometricians seem prone to do. 
> Imprecise and/or ambiguous usage renders the meaning of sentences difficult 
> or impossible to understand for me and I suspect confuses others as well.
> 
> Norm MacLeod
> 
> 
> > On 3 Jun 2017, at 08:53, alcardini <alcard...@gmail.com> wrote:
> > 
> > Hi Philipp,
> > I am not worried about the number of variables (although I am not sure
> > one needs thousands of highly correlated points on a relatively simple
> > structure and seem to remember that Gunz and you suggest to start with
> > many and then reduce as appropriate).
> > 
> > Regardless of whether point homology makes sense, I am worried that
> > many users believe that semilandmarks (maybe after sliding according
> > to purely mathematical principles) are the same as "traditional
> > landmarks" with a clear one-to-one correspondence. Even saying that
> > what's "homologous" is the curve or surface is tricky, because at the
> > end of the day that curve/surface is discretized using points, shape
> > distances are based on those points and there are many ways of placing
> > points with no clear "homology" (figure 7 of Oxnard & O'Higgins,
> > 2009); indeed, in a ontogenetic study of the cranial vault, for
> > instance, where sutures may become invisible in adults and therefore
> > cannot be used as a "boundary", semilandmarks close to the sutures may
> > end up on different bones in different stages/individuals.
> > 
> > Semilandmarks are a fantastic too

Re: [MORPHMET] Re: number of landmarks and sample size

2017-06-03 Thread Norman MacLeod
In discussions like these it would be helpful if the writer could clarify 
whether they are referring to the concepts of biological homology, topological 
homology or "semantic homology". These aren't the same things and the whole 
issue of “homology” in geometric morphometrics has always seemed, at least to 
me, to be very confused. For example, refer to the definitions of “homology” 
and “landmark” in the Glossary on the SB Morphometrics web site. Because it 
means different things to different specialists homology isn't a term to be 
thrown around as lightly as morphometricians seem prone to do. Imprecise and/or 
ambiguous usage renders the meaning of sentences difficult or impossible to 
understand for me and I suspect confuses others as well.

Norm MacLeod


> On 3 Jun 2017, at 08:53, alcardini  wrote:
> 
> Hi Philipp,
> I am not worried about the number of variables (although I am not sure
> one needs thousands of highly correlated points on a relatively simple
> structure and seem to remember that Gunz and you suggest to start with
> many and then reduce as appropriate).
> 
> Regardless of whether point homology makes sense, I am worried that
> many users believe that semilandmarks (maybe after sliding according
> to purely mathematical principles) are the same as "traditional
> landmarks" with a clear one-to-one correspondence. Even saying that
> what's "homologous" is the curve or surface is tricky, because at the
> end of the day that curve/surface is discretized using points, shape
> distances are based on those points and there are many ways of placing
> points with no clear "homology" (figure 7 of Oxnard & O'Higgins,
> 2009); indeed, in a ontogenetic study of the cranial vault, for
> instance, where sutures may become invisible in adults and therefore
> cannot be used as a "boundary", semilandmarks close to the sutures may
> end up on different bones in different stages/individuals.
> 
> Semilandmarks are a fantastic tool, which I am happy to use when
> needed, but they have their own limitations, which one should be aware
> of.
> Cheers
> 
> Andrea
> 
> 
> 
> On 03/06/2017, mitte...@univie.ac.at  wrote:
>> I think a few topics get mixed up here.
>> 
>> Of course, a sample can be too small to be representative (as in Andrea's
>> example), and one should think carefully about the measures to take. It is
>> also clear that an increase in sample size reduces standard errors of
>> statistical estimates, including that of a covariance matrix and its
>> eigenvalues. But, as mentioned by Dean, the standard errors of the
>> eigenvalues are of secondary interest in PCA.
>> 
>> If one has a clear expectation about the signal in the data - and if one
>> does not aim at new discoveries - a few specific measurements may suffice,
>> perhaps even a few distance measurements. But effective exploratory
>> analyses have always been a major strength of geometric morphometrics,
>> enabled by the powerful visualization methods together with the large
>> number of measured variables.
>> 
>> Andrea, I am actually curious what worries you if one "collects between
>> 2700 and 10 400 homologous landmarks from each rib" (whatever the term
>> "homologous" is supposed to mean here)?
>> 
>> Compared to many other disciplines in contemporary biology and biomedicine,
>> 
>> a few thousand variables are not particularly many. Consider, for instance,
>> 
>> 2D and 3D image analysis, FEA, and all the "omics", with millions and
>> billions of variables. In my opinion, the challenge with these "big data"
>> is not statistical power in testing a signal, but finding the signal - the
>> low-dimensional subspace of interest - in the fist place. But this applies
>> to 50 or 100 variables as well, not only to thousands or millions. If no
>> prior expectation about this signal existed (which the mere presence of so
>> many variables usually implies), no hypothesis test should be performed at
>> all. The ignorance of this rule is one of the main reasons why so many GWAS
>> 
>> and voxel-based morphometry studies fail to be replicable.
>> 
>> Best wishes,
>> 
>> Philipp
>> 
>> --
>> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "MORPHMET" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to morphmet+unsubscr...@morphometrics.org.
>> 
> 
> 
> -- 
> 
> Dr. Andrea Cardini
> Researcher, Dipartimento di Scienze Chimiche e Geologiche, Università
> di Modena e Reggio Emilia, Via Campi, 103 - 41125 Modena - Italy
> tel. 0039 059 2058472
> 
> Adjunct Associate Professor, School of Anatomy, Physiology and Human
> Biology, The University of Western Australia, 35 Stirling Highway,
> Crawley WA 6009, Australia
> 
> E-mail address: alcard...@gmail.com, andrea.card...@unimore.it
> WEBPAGE: https://sites.google.com/site/alcardini/home/main
> 
> FREE Yellow 

Re: [MORPHMET] Re: number of landmarks and sample size

2017-06-03 Thread alcardini
Hi Philipp,
I am not worried about the number of variables (although I am not sure
one needs thousands of highly correlated points on a relatively simple
structure and seem to remember that Gunz and you suggest to start with
many and then reduce as appropriate).

Regardless of whether point homology makes sense, I am worried that
many users believe that semilandmarks (maybe after sliding according
to purely mathematical principles) are the same as "traditional
landmarks" with a clear one-to-one correspondence. Even saying that
what's "homologous" is the curve or surface is tricky, because at the
end of the day that curve/surface is discretized using points, shape
distances are based on those points and there are many ways of placing
points with no clear "homology" (figure 7 of Oxnard & O'Higgins,
2009); indeed, in a ontogenetic study of the cranial vault, for
instance, where sutures may become invisible in adults and therefore
cannot be used as a "boundary", semilandmarks close to the sutures may
end up on different bones in different stages/individuals.

Semilandmarks are a fantastic tool, which I am happy to use when
needed, but they have their own limitations, which one should be aware
of.
Cheers

Andrea



On 03/06/2017, mitte...@univie.ac.at  wrote:
> I think a few topics get mixed up here.
>
> Of course, a sample can be too small to be representative (as in Andrea's
> example), and one should think carefully about the measures to take. It is
> also clear that an increase in sample size reduces standard errors of
> statistical estimates, including that of a covariance matrix and its
> eigenvalues. But, as mentioned by Dean, the standard errors of the
> eigenvalues are of secondary interest in PCA.
>
> If one has a clear expectation about the signal in the data - and if one
> does not aim at new discoveries - a few specific measurements may suffice,
> perhaps even a few distance measurements. But effective exploratory
> analyses have always been a major strength of geometric morphometrics,
> enabled by the powerful visualization methods together with the large
> number of measured variables.
>
> Andrea, I am actually curious what worries you if one "collects between
> 2700 and 10 400 homologous landmarks from each rib" (whatever the term
> "homologous" is supposed to mean here)?
>
> Compared to many other disciplines in contemporary biology and biomedicine,
>
> a few thousand variables are not particularly many. Consider, for instance,
>
> 2D and 3D image analysis, FEA, and all the "omics", with millions and
> billions of variables. In my opinion, the challenge with these "big data"
> is not statistical power in testing a signal, but finding the signal - the
> low-dimensional subspace of interest - in the fist place. But this applies
> to 50 or 100 variables as well, not only to thousands or millions. If no
> prior expectation about this signal existed (which the mere presence of so
> many variables usually implies), no hypothesis test should be performed at
> all. The ignorance of this rule is one of the main reasons why so many GWAS
>
> and voxel-based morphometry studies fail to be replicable.
>
> Best wishes,
>
> Philipp
>
> --
> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
> ---
> You received this message because you are subscribed to the Google Groups
> "MORPHMET" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to morphmet+unsubscr...@morphometrics.org.
>


-- 

Dr. Andrea Cardini
Researcher, Dipartimento di Scienze Chimiche e Geologiche, Università
di Modena e Reggio Emilia, Via Campi, 103 - 41125 Modena - Italy
tel. 0039 059 2058472

Adjunct Associate Professor, School of Anatomy, Physiology and Human
Biology, The University of Western Australia, 35 Stirling Highway,
Crawley WA 6009, Australia

E-mail address: alcard...@gmail.com, andrea.card...@unimore.it
WEBPAGE: https://sites.google.com/site/alcardini/home/main

FREE Yellow BOOK on Geometric Morphometrics:
http://www.italian-journal-of-mammalogy.it/public/journals/3/issue_241_complete_100.pdf

ESTIMATE YOUR GLOBAL FOOTPRINT:
http://www.footprintnetwork.org/en/index.php/GFN/page/calculators/

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.



Re: [MORPHMET] Re: number of landmarks and sample size

2017-06-02 Thread David Thulman
Hello,
I'm an archaeologist who works on artifacts in North America. There are not
many of us that use LGM, but even we can't seem to agree on how many LMs
are appropriate. Because I use discriminant function analysis as the
workhorse for discriminating groups of artifacts, I worry about the misuse
of that technique. One thing I've read (e.g., Qiao et al. 2009) in regards
to DFA is that too many variables (LMs) can affect its discriminatory power
through data piling or the related phenomenon of overfitting. I have seen
this in my practice but have not tested it rigorously. By reducing the
number of LMs, I can sometimes get better discrimination between groups.

Numbers of artifacts (specimens) is not a problem. I'm about to embark on a
regional analysis using 1000's.

Does anyone who understands this phenomenon better than I do care to
comment?

Thanks,
Dave Thulman

On Fri, Jun 2, 2017 at 6:12 PM, mitte...@univie.ac.at  wrote:

> I think a few topics get mixed up here.
>
> Of course, a sample can be too small to be representative (as in Andrea's
> example), and one should think carefully about the measures to take. It is
> also clear that an increase in sample size reduces standard errors of
> statistical estimates, including that of a covariance matrix and its
> eigenvalues. But, as mentioned by Dean, the standard errors of the
> eigenvalues are of secondary interest in PCA.
>
> If one has a clear expectation about the signal in the data - and if one
> does not aim at new discoveries - a few specific measurements may suffice,
> perhaps even a few distance measurements. But effective exploratory
> analyses have always been a major strength of geometric morphometrics,
> enabled by the powerful visualization methods together with the large
> number of measured variables.
>
> Andrea, I am actually curious what worries you if one "collects between
> 2700 and 10 400 homologous landmarks from each rib" (whatever the term
> "homologous" is supposed to mean here)?
>
> Compared to many other disciplines in contemporary biology and
> biomedicine, a few thousand variables are not particularly many. Consider,
> for instance, 2D and 3D image analysis, FEA, and all the "omics", with
> millions and billions of variables. In my opinion, the challenge with these
> "big data" is not statistical power in testing a signal, but finding the
> signal - the low-dimensional subspace of interest - in the fist place. But
> this applies to 50 or 100 variables as well, not only to thousands or
> millions. If no prior expectation about this signal existed (which the mere
> presence of so many variables usually implies), no hypothesis test should
> be performed at all. The ignorance of this rule is one of the main reasons
> why so many GWAS and voxel-based morphometry studies fail to be replicable.
>
> Best wishes,
>
> Philipp
>
> --
> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
> ---
> You received this message because you are subscribed to the Google Groups
> "MORPHMET" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to morphmet+unsubscr...@morphometrics.org.
>

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


Re: [MORPHMET] Re: number of landmarks and sample size

2017-06-02 Thread mitte...@univie.ac.at
I think a few topics get mixed up here.

Of course, a sample can be too small to be representative (as in Andrea's 
example), and one should think carefully about the measures to take. It is 
also clear that an increase in sample size reduces standard errors of 
statistical estimates, including that of a covariance matrix and its 
eigenvalues. But, as mentioned by Dean, the standard errors of the 
eigenvalues are of secondary interest in PCA.

If one has a clear expectation about the signal in the data - and if one 
does not aim at new discoveries - a few specific measurements may suffice, 
perhaps even a few distance measurements. But effective exploratory 
analyses have always been a major strength of geometric morphometrics, 
enabled by the powerful visualization methods together with the large 
number of measured variables.

Andrea, I am actually curious what worries you if one "collects between 
2700 and 10 400 homologous landmarks from each rib" (whatever the term 
"homologous" is supposed to mean here)? 

Compared to many other disciplines in contemporary biology and biomedicine, 
a few thousand variables are not particularly many. Consider, for instance, 
2D and 3D image analysis, FEA, and all the "omics", with millions and 
billions of variables. In my opinion, the challenge with these "big data" 
is not statistical power in testing a signal, but finding the signal - the 
low-dimensional subspace of interest - in the fist place. But this applies 
to 50 or 100 variables as well, not only to thousands or millions. If no 
prior expectation about this signal existed (which the mere presence of so 
many variables usually implies), no hypothesis test should be performed at 
all. The ignorance of this rule is one of the main reasons why so many GWAS 
and voxel-based morphometry studies fail to be replicable.

Best wishes,

Philipp

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


RE: [MORPHMET] Re: number of landmarks and sample size

2017-06-02 Thread Murat Maga
Just to comment.

While it is worthwhile to investigate these issues, in my experience same sizes 
are limited not because investigators are NOT willing to measure more 
specimens, but there are no additional specimens to include in the analysis, 
especially for studies based on natural populations, or historical collections.

M


From: William Gelnaw [mailto:wgel...@gmail.com]
Sent: Wednesday, May 31, 2017 3:41 PM
To: mitte...@univie.ac.at
Cc: MORPHMET <morphmet@morphometrics.org>
Subject: Re: [MORPHMET] Re: number of landmarks and sample size

I'm currently working on a paper that deals with the problem of 
over-parameterizing PCA in morphometrics.  The recommendations that I'm making 
in the paper are that you should try to have at least 3 times as many samples 
as variables.  That means that if you have 10 2D landmarks, you should have at 
least 60 specimens that you measure.  Based on simulations, if you have fewer 
than 3 specimens per variable, you quickly start getting eigenvalues for a PCA 
that are very different from known true eigenvalues.  I did a literature survey 
and about a quarter of morphometrics studies in the last decade haven't met 
that standard.  A good way to test if you have enough samples is to do a 
jackknife analysis.  If you cut out about 10% of your observations and still 
get the same eigenvalues, then your results are probably stable.
  I hope this helps.
  - Will

On Wed, May 31, 2017 at 1:31 PM, 
mitte...@univie.ac.at<mailto:mitte...@univie.ac.at> 
<mitte...@univie.ac.at<mailto:mitte...@univie.ac.at>> wrote:
Adding more (semi)landmarks inevitably increases the spatial resolution and 
thus allows one to capture finer anatomical details - whether relevant to the 
biological question or not. This can be advantageous for the reconstruction of 
shapes, especially when producing 3D morphs by warping dense surface 
representations. Basic developmental or evolutionary trends, group structures, 
etc., often are visible in an ordination analysis with a smaller set of 
relevant landmarks; finer anatomical resolution not necessarily affects these 
patterns. However, adding more landmarks cannot reduce or even remove any 
signals that were found with less landmarks, but it can make ordination 
analyses and the interpretation distances and angles in shape space more 
challenging.

An excess of variables (landmarks) over specimens does NOT pose problems to 
statistical methods such as the computation of mean shapes and Procrustes 
distances, PCA, PLS, and the multivariate regression of shape coordinates on 
some independent variable (shape regression). These methods are based on 
averages or regressions computed for each variable separately, or on the 
decomposition of a covariance matrix.

Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and relative 
eigenanalysis require the inversions of a full-rank covariance matrix, which 
implies an access of specimens over variables. The same applies to many 
multivariate parametric test statistics, such as Hotelling's T2, Wilks' Lambda, 
etc. But shape coordinates are NEVER of full rank and thus can never be 
subjected to any of these methods without prior variable reduction. In fact, 
reliable results can only be obtained if there are manifold more specimens than 
variables, which usually requires variable reduction by PCA, PLS or other 
techniques, or the regularization of covariance matrices (which is more common 
in the bioinformatic community).

For these reasons, I do not see any disadvantage of measuring a large number of 
landmarks, except for a waste of time perhaps. If life time is an issue, one 
can optimize landmark schemes as suggested by Jim or Aki.

Best,

Philipp

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
morphmet+unsubscr...@morphometrics.org<mailto:morphmet+unsubscr...@morphometrics.org>.

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
morphmet+unsubscr...@morphometrics.org<mailto:morphmet+unsubscr...@morphometrics.org>.

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


RE: [MORPHMET] Re: number of landmarks and sample size

2017-06-02 Thread Adams, Dean [EEOBS]
Will,

I’m not quite sure what over-parameterizing means in the case of PCA, as it is 
simply a rigid-rotation of the dataspace and does not provide parameters for 
statistical inference.

As for the distribution of eigenvalues, this of course is based on the 
underlying covariance matrix for the traits, which in turn will be affected by 
sample size. However, when traits become even mildly correlated (as is 
certainly the case for landmark coordinates), the distribution of eigenvalues 
of the covariance matrix becomes much better behaved. Specifically, the 
eigenvalues associated with low and high PC axes are less extreme than is 
observed with uncorrelated traits. That implies greater stability in their 
estimation, as the covariance matrix is further from singular (see the large 
statistical literature on the condition of a covariance matrix and subsequent 
estimation issues for ill-behaved covariance matrices).

Best,

Dean


Dr. Dean C. Adams
Professor
Department of Ecology, Evolution, and Organismal Biology
   Department of Statistics
Iowa State University
www.public.iastate.edu/~dcadams/<http://www.public.iastate.edu/~dcadams/>
phone: 515-294-3834

From: William Gelnaw [mailto:wgel...@gmail.com]
Sent: Wednesday, May 31, 2017 5:41 PM
To: mitte...@univie.ac.at
Cc: MORPHMET <morphmet@morphometrics.org>
Subject: Re: [MORPHMET] Re: number of landmarks and sample size

I'm currently working on a paper that deals with the problem of 
over-parameterizing PCA in morphometrics.  The recommendations that I'm making 
in the paper are that you should try to have at least 3 times as many samples 
as variables.  That means that if you have 10 2D landmarks, you should have at 
least 60 specimens that you measure.  Based on simulations, if you have fewer 
than 3 specimens per variable, you quickly start getting eigenvalues for a PCA 
that are very different from known true eigenvalues.  I did a literature survey 
and about a quarter of morphometrics studies in the last decade haven't met 
that standard.  A good way to test if you have enough samples is to do a 
jackknife analysis.  If you cut out about 10% of your observations and still 
get the same eigenvalues, then your results are probably stable.
  I hope this helps.
  - Will

On Wed, May 31, 2017 at 1:31 PM, 
mitte...@univie.ac.at<mailto:mitte...@univie.ac.at> 
<mitte...@univie.ac.at<mailto:mitte...@univie.ac.at>> wrote:
Adding more (semi)landmarks inevitably increases the spatial resolution and 
thus allows one to capture finer anatomical details - whether relevant to the 
biological question or not. This can be advantageous for the reconstruction of 
shapes, especially when producing 3D morphs by warping dense surface 
representations. Basic developmental or evolutionary trends, group structures, 
etc., often are visible in an ordination analysis with a smaller set of 
relevant landmarks; finer anatomical resolution not necessarily affects these 
patterns. However, adding more landmarks cannot reduce or even remove any 
signals that were found with less landmarks, but it can make ordination 
analyses and the interpretation distances and angles in shape space more 
challenging.

An excess of variables (landmarks) over specimens does NOT pose problems to 
statistical methods such as the computation of mean shapes and Procrustes 
distances, PCA, PLS, and the multivariate regression of shape coordinates on 
some independent variable (shape regression). These methods are based on 
averages or regressions computed for each variable separately, or on the 
decomposition of a covariance matrix.

Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and relative 
eigenanalysis require the inversions of a full-rank covariance matrix, which 
implies an access of specimens over variables. The same applies to many 
multivariate parametric test statistics, such as Hotelling's T2, Wilks' Lambda, 
etc. But shape coordinates are NEVER of full rank and thus can never be 
subjected to any of these methods without prior variable reduction. In fact, 
reliable results can only be obtained if there are manifold more specimens than 
variables, which usually requires variable reduction by PCA, PLS or other 
techniques, or the regularization of covariance matrices (which is more common 
in the bioinformatic community).

For these reasons, I do not see any disadvantage of measuring a large number of 
landmarks, except for a waste of time perhaps. If life time is an issue, one 
can optimize landmark schemes as suggested by Jim or Aki.

Best,

Philipp

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
morphmet+unsubscr...@morphometrics.org<mailto:morphmet+unsubscr...@morphometrics.org>.

--
MORPHMET may be accessed v

Re: [MORPHMET] Re: number of landmarks and sample size

2017-06-02 Thread William Gelnaw
I'm currently working on a paper that deals with the problem of
over-parameterizing PCA in morphometrics.  The recommendations that I'm
making in the paper are that you should try to have at least 3 times as
many samples as variables.  That means that if you have 10 2D landmarks,
you should have at least 60 specimens that you measure.  Based on
simulations, if you have fewer than 3 specimens per variable, you quickly
start getting eigenvalues for a PCA that are very different from known true
eigenvalues.  I did a literature survey and about a quarter of
morphometrics studies in the last decade haven't met that standard.  A good
way to test if you have enough samples is to do a jackknife analysis.  If
you cut out about 10% of your observations and still get the same
eigenvalues, then your results are probably stable.
  I hope this helps.
  - Will

On Wed, May 31, 2017 at 1:31 PM, mitte...@univie.ac.at <
mitte...@univie.ac.at> wrote:

> Adding more (semi)landmarks inevitably increases the spatial resolution
> and thus allows one to capture finer anatomical details - whether relevant
> to the biological question or not. This can be advantageous for the
> reconstruction of shapes, especially when producing 3D morphs by warping
> dense surface representations. Basic developmental or evolutionary trends,
> group structures, etc., often are visible in an ordination analysis with a
> smaller set of relevant landmarks; finer anatomical resolution not
> necessarily affects these patterns. However, adding more landmarks cannot
> reduce or even remove any signals that were found with less landmarks, but
> it can make ordination analyses and the interpretation distances and angles
> in shape space more challenging.
>
> An excess of variables (landmarks) over specimens does NOT pose problems
> to statistical methods such as the computation of mean shapes and
> Procrustes distances, PCA, PLS, and the multivariate regression of shape
> coordinates on some independent variable (shape regression). These methods
> are based on averages or regressions computed for each variable separately,
> or on the decomposition of a covariance matrix.
>
> Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and
> relative eigenanalysis require the inversions of a full-rank covariance
> matrix, which implies an access of specimens over variables. The same
> applies to many multivariate parametric test statistics, such as
> Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full
> rank and thus can never be subjected to any of these methods without prior
> variable reduction. In fact, reliable results can only be obtained if there
> are manifold more specimens than variables, which usually requires variable
> reduction by PCA, PLS or other techniques, or the regularization of
> covariance matrices (which is more common in the bioinformatic community).
>
> For these reasons, I do not see any disadvantage of measuring a large
> number of landmarks, except for a waste of time perhaps. If life time is an
> issue, one can optimize landmark schemes as suggested by Jim or Aki.
>
> Best,
>
> Philipp
>
> --
> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
> ---
> You received this message because you are subscribed to the Google Groups
> "MORPHMET" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to morphmet+unsubscr...@morphometrics.org.
>

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


[MORPHMET] Re: number of landmarks and sample size

2017-06-01 Thread Aki Watanabe
Dear all,

I completely agree with the general consensus that the research question 
should inform the landmark sampling. As the first step in a morphometric 
study, landmark sampling is definitely worth thinking about deeply since, 
as discussed here and in many previous studies, it can generate spurious 
and unintended artifacts in alignment and downstream analyses. It's 
important to consider "quality over quantity" at both the individual 
landmark level and also at the level of the overall landmark configuration. 
For example, landmarks should be fairly evenly distributed on a structure 
of interest to avoid "Pinocchio effect" where isolated landmarks from the 
centroid end up having more impact on the alignment because the 
optimalization is based on *squared* distance from the centroid. 
Alternatively, one can use the Resistant Fit alignment to mitigate this 
issue.

Another note--my landmark sampling study shows that adding certain 
landmarks to a subsampled data set can sometimes decrease the overall fit 
to the full data set with complete set of landmarks. This result further 
supports the "quality over quantity" idea where choosing poor landmarks can 
lead to spurious characterization of shape variation, at least with respect 
to the full data set. Put in another way, adding more landmarks does not 
guarantee convergence to the full shape characterization (although it does 
typically converge from personal observation).

Happy landmarking,
Aki


On Tuesday, May 9, 2017 at 12:26:04 PM UTC+1, Lea Wolter wrote:
>
> Hello everyone, 
>
> I am new in the field of geometric morphometrics and have a question for 
> my bachelor thesis. 
>   
> I am not sure how many landmarks I should use at most in regard to the 
> sample size. I have a sample of about 22 individuals per population or 
> maybe a bit less (using sternum and epigyne of spiders) with 5 populations. 
> I have read a paper in which they use 18 landmarks with an even lower 
> sample size (3 populations with 20 individuals, 1 with 10). But I have also 
> heard that I should use twice as much individuals per population as land 
> marks... 
>
> Maybe there is some mathematical formula for it to know if it would be 
> statistically significant? Could you recommend some paper? 
>
> Because of the symmetry of the epigyne I am now thinking of using just one 
> half of it for setting landmarks (so I get 5 instead of 9 landmarks). For 
> the sternum I thought about 7 or 9 landmarks, so at most I would also get 
> 18 landmarks like in the paper. 
>
> I would also like to use two type specimens in the analysis, but I have 
> just this one individual per population... would it be totally nonesens in 
> a statistical point of view? 
>
> Thanks very much for your help! 
>
> Best regards 
> Lea

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


[MORPHMET] Re: number of landmarks and sample size

2017-05-31 Thread mitte...@univie.ac.at
Adding more (semi)landmarks inevitably increases the spatial resolution and 
thus allows one to capture finer anatomical details - whether relevant to 
the biological question or not. This can be advantageous for the 
reconstruction of shapes, especially when producing 3D morphs by warping 
dense surface representations. Basic developmental or evolutionary trends, 
group structures, etc., often are visible in an ordination analysis with a 
smaller set of relevant landmarks; finer anatomical resolution not 
necessarily affects these patterns. However, adding more landmarks cannot 
reduce or even remove any signals that were found with less landmarks, but 
it can make ordination analyses and the interpretation distances and angles 
in shape space more challenging.

An excess of variables (landmarks) over specimens does NOT pose problems to 
statistical methods such as the computation of mean shapes and Procrustes 
distances, PCA, PLS, and the multivariate regression of shape coordinates 
on some independent variable (shape regression). These methods are based on 
averages or regressions computed for each variable separately, or on the 
decomposition of a covariance matrix. 

Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and 
relative eigenanalysis require the inversions of a full-rank covariance 
matrix, which implies an access of specimens over variables. The same 
applies to many multivariate parametric test statistics, such as 
Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full 
rank and thus can never be subjected to any of these methods without prior 
variable reduction. In fact, reliable results can only be obtained if there 
are manifold more specimens than variables, which usually requires variable 
reduction by PCA, PLS or other techniques, or the regularization of 
covariance matrices (which is more common in the bioinformatic community).

For these reasons, I do not see any disadvantage of measuring a large 
number of landmarks, except for a waste of time perhaps. If life time is an 
issue, one can optimize landmark schemes as suggested by Jim or Aki.

Best,

Philipp

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


Re: [MORPHMET] Re: number of landmarks and sample size

2017-05-31 Thread andrea cardini

Dear All,
I'd like to add a few comments on sampling (landmarks but also 
specimens). I hope that some of the other subscribers, who know much 
more than I do about morphometrics, will refine and correct my points.



A very short one on my two papers. They make a very simple point: if one 
is landmarking just one side of a structure with object symmetry simply 
to speed up data collection, then mirror-reconstructing the missing side 
will make a nicer visualization and probably make shape data which are 
closer to those obtained by landmarking both sides. The difference may 
be tiny and I said "probably" because I am reporting results of 
empirical studies: out of 11-12 datasets, all but one had shape 
distances closer to those of the full bilateral landmark data after 
mirror-reconstructing the missing side. This did not work in one dataset 
which happened to have a very large amount of fluctuating asymmetry.
To what extent these results are generalizable, I can't say but everyone 
can plan a small preliminary analysis to check it in her/his own data.


I fully agree with Aki that, if time, money etc. are not a constraint, 
even when one is not interested in asymmetry, it is better to measure 
both sides. That's in fact true also for structures with matching symmetry.



In terms of the choice of landmarks, I wish to stress (once more!) that 
quality may be more important than quantity: first one should think well 
about what she/he wants to measure, which will relate to the specific 
question being asked, and then decide about where and how many landmarks 
to use. There are at least two wonderful papers I suggested several 
times on this issue:

Oxnard & O'Higgins, 2009, Biological Theory 4(1), 84–97.
Klingenberg, 2008, Evol Biol 35:186–190

Then, especially for semilandmarks, I guess that as Aki (and others 
before) suggested, one can see what a good compromise is between 
information and the number of points (maybe considering also, but not 
principally, the visualization).



For sample size, one should consider whether differences are presumably 
big (and a small sample might be OK...ish) or small (as in most 
microevolutionary studies, which generally require large N). I believe 
that Rohlf, already in the early days of geometric morphometrics, had 
written a software for exploring statistical power in shape data 
(TPSPower) but I am not sure if he kept developing it. In any case, 
power and sensitivity (to sampling) analyeses are certainly available in R.
With small differences, although resampling methods may allow to perform 
tests even with tiny samples, power will be low and estimates (say, mean 
size and shape, variance and covariance etc.) will be likely inaccurate.
Unfortunately, often, the most interesting taxa are rare populations (or 
fossils) for which specimens are difficult to find.


A couple of people told me that there's an important paper coming out 
soon on sampling error in geometric morphometrics and it might suggest 
that one really needs huge samples. I would not be surprised and suspect 
that the few empirical studies we did (a couple of papers in 
Zoomorphology) were overoptimistic despite already suggesting (more or 
less) that one might need several dozens of specimens even when 
differences are relatively large and the number of landmarks was not 
particularly large. Again, they were empirical studies and one cannot 
say how generalizable they are.
Anyway, I look forward to this new paper and hope it will be announced 
in MORPHMET, as well as I look forward to Aki's paper.



Cheers

Andrea


On 29/05/17 18:35, Aki Watanabe wrote:

Dear Lea,

Unfortunately, there isn't (yet) a magic mathematical formula to 
determine whether you've sampled enough landmarks, but there are some 
exploratory approaches you can take to see if you're landmark sampling 
is converging to the "true" shape variation. One simple thing you can do 
is sample as many landmarks as you can on a representative sampling of 
specimens, then create a PC morphospace. Then, subsample the landmarks 
(e.g., 75%, 50%, 25% of the landmarks) and see if the PC morphospace 
from these subsampled datasets mirror the distribution of shapes of the 
full dataset. If the morphospaces begin deviating from the PC 
morphospace of the full dataset, then you have a visual cue that the 
subsampling is not adequately characterizing the shape variation of your 
specimens. In terms of a statistically significant test for landmark 
sampling, I suppose one can test for correlation between subsampled and 
full dataset, but because the subsampled and full dataset will be 
auto-correlated to some extent, the null would have to reflect this.


Alternatively, I have a script that automatically subsamples the 
landmarks of a given dataset and creates a plot to see how well the 
subsampled datasets converge to the point distribution of the full 
dataset. If you are interested, I would be happy to describe the 
technique in more detail 

[MORPHMET] Re: number of landmarks and sample size

2017-05-29 Thread Aki Watanabe
Dear Lea,

Unfortunately, there isn't (yet) a magic mathematical formula to determine 
whether you've sampled enough landmarks, but there are some exploratory 
approaches you can take to see if you're landmark sampling is converging to 
the "true" shape variation. One simple thing you can do is sample as many 
landmarks as you can on a representative sampling of specimens, then create 
a PC morphospace. Then, subsample the landmarks (e.g., 75%, 50%, 25% of the 
landmarks) and see if the PC morphospace from these subsampled 
datasets mirror the distribution of shapes of the full dataset. If 
the morphospaces begin deviating from the PC morphospace of the full 
dataset, then you have a visual cue that the subsampling is not adequately 
characterizing the shape variation of your specimens. In terms of a 
statistically significant test for landmark sampling, I suppose one can 
test for correlation between subsampled and full dataset, but because the 
subsampled and full dataset will be auto-correlated to some extent, the 
null would have to reflect this.

Alternatively, I have a script that automatically subsamples the landmarks 
of a given dataset and creates a plot to see how well the subsampled 
datasets converge to the point distribution of the full dataset. If you are 
interested, I would be happy to describe the technique in more detail 
and/or run the analysis on your dataset if you don't mind sending me the 
data. The script is currently under review for a journal, so it's not 
available yet to the public.

Also, as you mention, having more shape variables (i.e., number of 
landmarks x 2 or 3 depending on 2-D or 3-D landmarks) than the number of 
specimens will generally reduce the power of statistical tests. There are 
ways to counter this issue (e.g., Q-mode approach recently proposed by Dean 
Adams).

Now, concerning the sampling of bilateral landmarks, Andrea Cardini has 
recently written a nice pair of papers on the subject:

Cardini, A. 2016. Left, right or both? Estimating and improving accuracy of 
one-side-only geometric morphometric analyses of cranial variation. J Zool 
Syst Evol Res.

Cardini, A. 2016. Lost in the other half: improving accuracy in geometric 
morphometric analyses of one side of bilaterally symmetric structures. Syst 
Biol.

These papers highlight the artifact that originates from performing 
Procrustes alignment on "one-side-only" datasets. At least for alignment 
purposes, I suggest sampling both sides of bilaterally symmetric structures.

Hope this helps.

All the best,
Aki

On Tuesday, May 9, 2017 at 12:26:04 PM UTC+1, Lea Wolter wrote:
>
> Hello everyone, 
>
> I am new in the field of geometric morphometrics and have a question for 
> my bachelor thesis. 
>   
> I am not sure how many landmarks I should use at most in regard to the 
> sample size. I have a sample of about 22 individuals per population or 
> maybe a bit less (using sternum and epigyne of spiders) with 5 populations. 
> I have read a paper in which they use 18 landmarks with an even lower 
> sample size (3 populations with 20 individuals, 1 with 10). But I have also 
> heard that I should use twice as much individuals per population as land 
> marks... 
>
> Maybe there is some mathematical formula for it to know if it would be 
> statistically significant? Could you recommend some paper? 
>
> Because of the symmetry of the epigyne I am now thinking of using just one 
> half of it for setting landmarks (so I get 5 instead of 9 landmarks). For 
> the sternum I thought about 7 or 9 landmarks, so at most I would also get 
> 18 landmarks like in the paper. 
>
> I would also like to use two type specimens in the analysis, but I have 
> just this one individual per population... would it be totally nonesens in 
> a statistical point of view? 
>
> Thanks very much for your help! 
>
> Best regards 
> Lea

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.