Re: [MORPHMET] Re: number of landmarks and sample size

2017-06-02 Thread David Thulman
Hello,
I'm an archaeologist who works on artifacts in North America. There are not
many of us that use LGM, but even we can't seem to agree on how many LMs
are appropriate. Because I use discriminant function analysis as the
workhorse for discriminating groups of artifacts, I worry about the misuse
of that technique. One thing I've read (e.g., Qiao et al. 2009) in regards
to DFA is that too many variables (LMs) can affect its discriminatory power
through data piling or the related phenomenon of overfitting. I have seen
this in my practice but have not tested it rigorously. By reducing the
number of LMs, I can sometimes get better discrimination between groups.

Numbers of artifacts (specimens) is not a problem. I'm about to embark on a
regional analysis using 1000's.

Does anyone who understands this phenomenon better than I do care to
comment?

Thanks,
Dave Thulman

On Fri, Jun 2, 2017 at 6:12 PM, mitte...@univie.ac.at  wrote:

> I think a few topics get mixed up here.
>
> Of course, a sample can be too small to be representative (as in Andrea's
> example), and one should think carefully about the measures to take. It is
> also clear that an increase in sample size reduces standard errors of
> statistical estimates, including that of a covariance matrix and its
> eigenvalues. But, as mentioned by Dean, the standard errors of the
> eigenvalues are of secondary interest in PCA.
>
> If one has a clear expectation about the signal in the data - and if one
> does not aim at new discoveries - a few specific measurements may suffice,
> perhaps even a few distance measurements. But effective exploratory
> analyses have always been a major strength of geometric morphometrics,
> enabled by the powerful visualization methods together with the large
> number of measured variables.
>
> Andrea, I am actually curious what worries you if one "collects between
> 2700 and 10 400 homologous landmarks from each rib" (whatever the term
> "homologous" is supposed to mean here)?
>
> Compared to many other disciplines in contemporary biology and
> biomedicine, a few thousand variables are not particularly many. Consider,
> for instance, 2D and 3D image analysis, FEA, and all the "omics", with
> millions and billions of variables. In my opinion, the challenge with these
> "big data" is not statistical power in testing a signal, but finding the
> signal - the low-dimensional subspace of interest - in the fist place. But
> this applies to 50 or 100 variables as well, not only to thousands or
> millions. If no prior expectation about this signal existed (which the mere
> presence of so many variables usually implies), no hypothesis test should
> be performed at all. The ignorance of this rule is one of the main reasons
> why so many GWAS and voxel-based morphometry studies fail to be replicable.
>
> Best wishes,
>
> Philipp
>
> --
> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
> ---
> You received this message because you are subscribed to the Google Groups
> "MORPHMET" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to morphmet+unsubscr...@morphometrics.org.
>

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


Re: [MORPHMET] Re: number of landmarks and sample size

2017-06-02 Thread mitte...@univie.ac.at
I think a few topics get mixed up here.

Of course, a sample can be too small to be representative (as in Andrea's 
example), and one should think carefully about the measures to take. It is 
also clear that an increase in sample size reduces standard errors of 
statistical estimates, including that of a covariance matrix and its 
eigenvalues. But, as mentioned by Dean, the standard errors of the 
eigenvalues are of secondary interest in PCA.

If one has a clear expectation about the signal in the data - and if one 
does not aim at new discoveries - a few specific measurements may suffice, 
perhaps even a few distance measurements. But effective exploratory 
analyses have always been a major strength of geometric morphometrics, 
enabled by the powerful visualization methods together with the large 
number of measured variables.

Andrea, I am actually curious what worries you if one "collects between 
2700 and 10 400 homologous landmarks from each rib" (whatever the term 
"homologous" is supposed to mean here)? 

Compared to many other disciplines in contemporary biology and biomedicine, 
a few thousand variables are not particularly many. Consider, for instance, 
2D and 3D image analysis, FEA, and all the "omics", with millions and 
billions of variables. In my opinion, the challenge with these "big data" 
is not statistical power in testing a signal, but finding the signal - the 
low-dimensional subspace of interest - in the fist place. But this applies 
to 50 or 100 variables as well, not only to thousands or millions. If no 
prior expectation about this signal existed (which the mere presence of so 
many variables usually implies), no hypothesis test should be performed at 
all. The ignorance of this rule is one of the main reasons why so many GWAS 
and voxel-based morphometry studies fail to be replicable.

Best wishes,

Philipp

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


RE: [MORPHMET] Re: number of landmarks and sample size

2017-06-02 Thread Murat Maga
Just to comment.

While it is worthwhile to investigate these issues, in my experience same sizes 
are limited not because investigators are NOT willing to measure more 
specimens, but there are no additional specimens to include in the analysis, 
especially for studies based on natural populations, or historical collections.

M


From: William Gelnaw [mailto:wgel...@gmail.com]
Sent: Wednesday, May 31, 2017 3:41 PM
To: mitte...@univie.ac.at
Cc: MORPHMET 
Subject: Re: [MORPHMET] Re: number of landmarks and sample size

I'm currently working on a paper that deals with the problem of 
over-parameterizing PCA in morphometrics.  The recommendations that I'm making 
in the paper are that you should try to have at least 3 times as many samples 
as variables.  That means that if you have 10 2D landmarks, you should have at 
least 60 specimens that you measure.  Based on simulations, if you have fewer 
than 3 specimens per variable, you quickly start getting eigenvalues for a PCA 
that are very different from known true eigenvalues.  I did a literature survey 
and about a quarter of morphometrics studies in the last decade haven't met 
that standard.  A good way to test if you have enough samples is to do a 
jackknife analysis.  If you cut out about 10% of your observations and still 
get the same eigenvalues, then your results are probably stable.
  I hope this helps.
  - Will

On Wed, May 31, 2017 at 1:31 PM, 
mitte...@univie.ac.at 
> wrote:
Adding more (semi)landmarks inevitably increases the spatial resolution and 
thus allows one to capture finer anatomical details - whether relevant to the 
biological question or not. This can be advantageous for the reconstruction of 
shapes, especially when producing 3D morphs by warping dense surface 
representations. Basic developmental or evolutionary trends, group structures, 
etc., often are visible in an ordination analysis with a smaller set of 
relevant landmarks; finer anatomical resolution not necessarily affects these 
patterns. However, adding more landmarks cannot reduce or even remove any 
signals that were found with less landmarks, but it can make ordination 
analyses and the interpretation distances and angles in shape space more 
challenging.

An excess of variables (landmarks) over specimens does NOT pose problems to 
statistical methods such as the computation of mean shapes and Procrustes 
distances, PCA, PLS, and the multivariate regression of shape coordinates on 
some independent variable (shape regression). These methods are based on 
averages or regressions computed for each variable separately, or on the 
decomposition of a covariance matrix.

Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and relative 
eigenanalysis require the inversions of a full-rank covariance matrix, which 
implies an access of specimens over variables. The same applies to many 
multivariate parametric test statistics, such as Hotelling's T2, Wilks' Lambda, 
etc. But shape coordinates are NEVER of full rank and thus can never be 
subjected to any of these methods without prior variable reduction. In fact, 
reliable results can only be obtained if there are manifold more specimens than 
variables, which usually requires variable reduction by PCA, PLS or other 
techniques, or the regularization of covariance matrices (which is more common 
in the bioinformatic community).

For these reasons, I do not see any disadvantage of measuring a large number of 
landmarks, except for a waste of time perhaps. If life time is an issue, one 
can optimize landmark schemes as suggested by Jim or Aki.

Best,

Philipp

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
morphmet+unsubscr...@morphometrics.org.

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
morphmet+unsubscr...@morphometrics.org.

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


RE: [MORPHMET] Re: number of landmarks and sample size

2017-06-02 Thread Adams, Dean [EEOBS]
Will,

I’m not quite sure what over-parameterizing means in the case of PCA, as it is 
simply a rigid-rotation of the dataspace and does not provide parameters for 
statistical inference.

As for the distribution of eigenvalues, this of course is based on the 
underlying covariance matrix for the traits, which in turn will be affected by 
sample size. However, when traits become even mildly correlated (as is 
certainly the case for landmark coordinates), the distribution of eigenvalues 
of the covariance matrix becomes much better behaved. Specifically, the 
eigenvalues associated with low and high PC axes are less extreme than is 
observed with uncorrelated traits. That implies greater stability in their 
estimation, as the covariance matrix is further from singular (see the large 
statistical literature on the condition of a covariance matrix and subsequent 
estimation issues for ill-behaved covariance matrices).

Best,

Dean


Dr. Dean C. Adams
Professor
Department of Ecology, Evolution, and Organismal Biology
   Department of Statistics
Iowa State University
www.public.iastate.edu/~dcadams/
phone: 515-294-3834

From: William Gelnaw [mailto:wgel...@gmail.com]
Sent: Wednesday, May 31, 2017 5:41 PM
To: mitte...@univie.ac.at
Cc: MORPHMET 
Subject: Re: [MORPHMET] Re: number of landmarks and sample size

I'm currently working on a paper that deals with the problem of 
over-parameterizing PCA in morphometrics.  The recommendations that I'm making 
in the paper are that you should try to have at least 3 times as many samples 
as variables.  That means that if you have 10 2D landmarks, you should have at 
least 60 specimens that you measure.  Based on simulations, if you have fewer 
than 3 specimens per variable, you quickly start getting eigenvalues for a PCA 
that are very different from known true eigenvalues.  I did a literature survey 
and about a quarter of morphometrics studies in the last decade haven't met 
that standard.  A good way to test if you have enough samples is to do a 
jackknife analysis.  If you cut out about 10% of your observations and still 
get the same eigenvalues, then your results are probably stable.
  I hope this helps.
  - Will

On Wed, May 31, 2017 at 1:31 PM, 
mitte...@univie.ac.at 
> wrote:
Adding more (semi)landmarks inevitably increases the spatial resolution and 
thus allows one to capture finer anatomical details - whether relevant to the 
biological question or not. This can be advantageous for the reconstruction of 
shapes, especially when producing 3D morphs by warping dense surface 
representations. Basic developmental or evolutionary trends, group structures, 
etc., often are visible in an ordination analysis with a smaller set of 
relevant landmarks; finer anatomical resolution not necessarily affects these 
patterns. However, adding more landmarks cannot reduce or even remove any 
signals that were found with less landmarks, but it can make ordination 
analyses and the interpretation distances and angles in shape space more 
challenging.

An excess of variables (landmarks) over specimens does NOT pose problems to 
statistical methods such as the computation of mean shapes and Procrustes 
distances, PCA, PLS, and the multivariate regression of shape coordinates on 
some independent variable (shape regression). These methods are based on 
averages or regressions computed for each variable separately, or on the 
decomposition of a covariance matrix.

Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and relative 
eigenanalysis require the inversions of a full-rank covariance matrix, which 
implies an access of specimens over variables. The same applies to many 
multivariate parametric test statistics, such as Hotelling's T2, Wilks' Lambda, 
etc. But shape coordinates are NEVER of full rank and thus can never be 
subjected to any of these methods without prior variable reduction. In fact, 
reliable results can only be obtained if there are manifold more specimens than 
variables, which usually requires variable reduction by PCA, PLS or other 
techniques, or the regularization of covariance matrices (which is more common 
in the bioinformatic community).

For these reasons, I do not see any disadvantage of measuring a large number of 
landmarks, except for a waste of time perhaps. If life time is an issue, one 
can optimize landmark schemes as suggested by Jim or Aki.

Best,

Philipp

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
morphmet+unsubscr...@morphometrics.org.

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---

Re: [MORPHMET] EDMA analyses

2017-06-02 Thread xiaofeixian
I found your message in EDMA analyses post. 
I am having recently done the problem of craniofacial differences.
Could you share an edmaware for me?
Looking forward to your reply.
在 2014年11月19日星期三 UTC+8上午12:53:52,Cole, Theodore M.写道:
> There's a far more up-to-date edma program called edmaware that I can send 
> you. If that doesn't fix your problem, you could send me a couple of your 
> files and I could try them here.  I'll send a download link later today.
> 
> Tim Cole
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Sent from my T-Mobile 4G LTE Device
> 
> 
> 
> 
> 
>  Original message 
> 
> From: Amber Wheat 
> 
> Date:11/18/2014 10:20 AM (GMT-06:00) 
> 
> To: morp...@morphometrics.org 
> 
> Subject: [MORPHMET] EDMA analyses 
> 
> 
> 
> 
> 
> 
> I'm trying to use the winEDMA program to run some analyses on juvenile 
> craniofacial morphology. I keep getting access violation codes. Does anyone 
> know how to deal with these? I've ran an outlier analysis in MorphoJ and have 
> gone through
>  my data to find outliers and I'm confident I've fixed outliers. It will let 
> me run the "estimation" procedure and output a mean form matrix just fine, 
> but when I use the same file to run the "form" analysis, it gives me an 
> access violation code. Any thoughts
>  on what to do? Greatly appreciate the help.
> 
> 
> 
> -- 
> 
> MORPHMET may be accessed via its webpage at 
> http://www.morphometrics.org
> 
> 
> 
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to morphmet+u...@morphometrics.org.

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


Re: [MORPHMET] EDMA analyses

2017-06-02 Thread xiaofeixian
在 2014年11月19日星期三 UTC+8上午12:53:52,Cole, Theodore M.写道:
> There's a far more up-to-date edma program called edmaware that I can send 
> you. If that doesn't fix your problem, you could send me a couple of your 
> files and I could try them here.  I'll send a download link later today.
> 
> Tim Cole
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Sent from my T-Mobile 4G LTE Device
> 
> 
> 
> 
> 
>  Original message 
> 
> From: Amber Wheat 
> 
> Date:11/18/2014 10:20 AM (GMT-06:00) 
> 
> To: morphmet@morphometrics.org 
> 
> Subject: [MORPHMET] EDMA analyses 
> 
> 
> 
> 
> 
> 
> I'm trying to use the winEDMA program to run some analyses on juvenile 
> craniofacial morphology. I keep getting access violation codes. Does anyone 
> know how to deal with these? I've ran an outlier analysis in MorphoJ and have 
> gone through
>  my data to find outliers and I'm confident I've fixed outliers. It will let 
> me run the "estimation" procedure and output a mean form matrix just fine, 
> but when I use the same file to run the "form" analysis, it gives me an 
> access violation code. Any thoughts
>  on what to do? Greatly appreciate the help.
> 
> 
> 
> -- 
> 
> MORPHMET may be accessed via its webpage at 
> http://www.morphometrics.org
> 
> 
> 
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to morphmet+unsubscr...@morphometrics.org.



在 2014年11月19日星期三 UTC+8上午12:53:52,Cole, Theodore M.写道:
> There's a far more up-to-date edma program called edmaware that I can send 
> you. If that doesn't fix your problem, you could send me a couple of your 
> files and I could try them here.  I'll send a download link later today.
> 
> Tim Cole
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Sent from my T-Mobile 4G LTE Device
> 
> 
> 
> 
> 
>  Original message 
> 
> From: Amber Wheat 
> 
> Date:11/18/2014 10:20 AM (GMT-06:00) 
> 
> To: morphmet@morphometrics.org 
> 
> Subject: [MORPHMET] EDMA analyses 
> 
> 
> 
> 
> 
> 
> I'm trying to use the winEDMA program to run some analyses on juvenile 
> craniofacial morphology. I keep getting access violation codes. Does anyone 
> know how to deal with these? I've ran an outlier analysis in MorphoJ and have 
> gone through
>  my data to find outliers and I'm confident I've fixed outliers. It will let 
> me run the "estimation" procedure and output a mean form matrix just fine, 
> but when I use the same file to run the "form" analysis, it gives me an 
> access violation code. Any thoughts
>  on what to do? Greatly appreciate the help.
> 
> 
> 
> -- 
> 
> MORPHMET may be accessed via its webpage at 
> http://www.morphometrics.org
> 
> 
> 
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to morphmet+unsubscr...@morphometrics.org.

I found your message in EDMA analyses post. 
I am having recently done the problem of craniofacial differences.
Could you share an edmaware for me?Thank you very much.
Looking forward to your reply.

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.


Re: [MORPHMET] Re: number of landmarks and sample size

2017-06-02 Thread William Gelnaw
I'm currently working on a paper that deals with the problem of
over-parameterizing PCA in morphometrics.  The recommendations that I'm
making in the paper are that you should try to have at least 3 times as
many samples as variables.  That means that if you have 10 2D landmarks,
you should have at least 60 specimens that you measure.  Based on
simulations, if you have fewer than 3 specimens per variable, you quickly
start getting eigenvalues for a PCA that are very different from known true
eigenvalues.  I did a literature survey and about a quarter of
morphometrics studies in the last decade haven't met that standard.  A good
way to test if you have enough samples is to do a jackknife analysis.  If
you cut out about 10% of your observations and still get the same
eigenvalues, then your results are probably stable.
  I hope this helps.
  - Will

On Wed, May 31, 2017 at 1:31 PM, mitte...@univie.ac.at <
mitte...@univie.ac.at> wrote:

> Adding more (semi)landmarks inevitably increases the spatial resolution
> and thus allows one to capture finer anatomical details - whether relevant
> to the biological question or not. This can be advantageous for the
> reconstruction of shapes, especially when producing 3D morphs by warping
> dense surface representations. Basic developmental or evolutionary trends,
> group structures, etc., often are visible in an ordination analysis with a
> smaller set of relevant landmarks; finer anatomical resolution not
> necessarily affects these patterns. However, adding more landmarks cannot
> reduce or even remove any signals that were found with less landmarks, but
> it can make ordination analyses and the interpretation distances and angles
> in shape space more challenging.
>
> An excess of variables (landmarks) over specimens does NOT pose problems
> to statistical methods such as the computation of mean shapes and
> Procrustes distances, PCA, PLS, and the multivariate regression of shape
> coordinates on some independent variable (shape regression). These methods
> are based on averages or regressions computed for each variable separately,
> or on the decomposition of a covariance matrix.
>
> Other techniques, including Mahalanobis distance, DFA, CVA, CCA, and
> relative eigenanalysis require the inversions of a full-rank covariance
> matrix, which implies an access of specimens over variables. The same
> applies to many multivariate parametric test statistics, such as
> Hotelling's T2, Wilks' Lambda, etc. But shape coordinates are NEVER of full
> rank and thus can never be subjected to any of these methods without prior
> variable reduction. In fact, reliable results can only be obtained if there
> are manifold more specimens than variables, which usually requires variable
> reduction by PCA, PLS or other techniques, or the regularization of
> covariance matrices (which is more common in the bioinformatic community).
>
> For these reasons, I do not see any disadvantage of measuring a large
> number of landmarks, except for a waste of time perhaps. If life time is an
> issue, one can optimize landmark schemes as suggested by Jim or Aki.
>
> Best,
>
> Philipp
>
> --
> MORPHMET may be accessed via its webpage at http://www.morphometrics.org
> ---
> You received this message because you are subscribed to the Google Groups
> "MORPHMET" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to morphmet+unsubscr...@morphometrics.org.
>

-- 
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- 
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to morphmet+unsubscr...@morphometrics.org.