Andrea,
I agree that one must consider both statistical significance and biological
meaningfulness in evaluating patterns. Considering one of these without the
other can often get one into trouble.
Your post concerned the inability to statistically detect differences due to
sample size limitations, and the possibility of concluding homogeneity from
this result when it may not be the case. But as Mike mentioned, the opposite is
also a concern. In fact, one might recall a discussion some months ago on
Morphmet on this very issue; where large samples afforded the ability to
discern allometric differences between groups, but where those statistical
differences may not be biologically important. In both cases, critical thinking
and a merger of statistical result and biological knowledge of the system are
required to arrive at a well-reasoned understanding of the patterns in the data.
Best,
Dean
Dr. Dean C. Adams
Professor
Department of Ecology, Evolution, and Organismal Biology
Department of Statistics
Iowa State University
www.public.iastate.edu/~dcadams/<http://www.public.iastate.edu/~dcadams/>
phone: 515-294-3834
From: Mike Collyer [mailto:[email protected]]
Sent: Monday, December 12, 2016 8:34 AM
To: andrea cardini <[email protected]>
Cc: [email protected]
Subject: Re: brief comment on non-significance Re: [MORPHMET] procD.allometry
with group inclusion
Andrea,
My opinion on this is that the researcher who has collected the data must
retain at all times a biological wisdom that supersedes a suggested course of
action based on results from a statistical test. If the purpose of a study is
to assess the allometric pattern of shape variation within populations, then
maybe the results of a homogeneity of slopes test can be an unnecessary burden.
If a researcher wants to compare the mean shapes of different groups but is
concerned that allometric variation might differ among groups, then a
homogeneity of slopes test could be an important first step, but I agree that a
non-significant result should not spur the researcher to immediately conclude a
common allometry or no allometry is appropriate. Sample size, variation in
size among groups, and appropriate distributions of specimen size within groups
might all be things to think about.
The point you make about a potential type II error is a real concern. The
opposite problem is also a real concern. One might have very large sample
sizes and sufficient statistical power to suggest that allometric slopes are
heterogeneous. However, the coefficient of determination and/or effect size
for size:group interaction might be quite small. Just because there is a low
probability of finding as large of an effect based on thousands of random
permutations, is one ready to accept that different groups have evolved unique
allometric trajectories? It is easy to forget that the choice of “significance
level” - the a priori acceptable rate of type I error - is arbitrary. Making
strong inferential decisions based on a binary decision for an arbitrary
criterion is probably not wise. I would argue that instead of focusing on a
P-value, one could just as arbitrarily, but perhaps more justifiably, choose a
coefficient of determination of R^2 = 0.10 or an effect size of 2 SD as a
criterion for whether to retain or omit the interaction coefficients that allow
for heterogenous slopes.
*** Warning: pedantic discussion on model selection starts here. Skip if
unappealing.
One could also turn to model selection approaches. However, I think
multivariate generalization for indices like AIC is an area lacking needed
theoretical research for high-dimensional shape data. There are two reasons
for this. First, the oft-defined AIC is model log-likelihood + 2K, where K is
the number of coefficients in a linear model (rank of the model design matrix)
+ 1, where the 1 is the dimension of the value for the variance of the error.
This is a simplification for univariate data. The second half of the equation
is actually 2[pk + 0.5p(p+1)], where p is the number of shape variables and k
is the rank of the design matrix. (One might define p as the rank of the shape
variable matrix - the number of actual dimensions in the tangent space, also
equal to the number of principal components with positive eigen values greater
than 0 from a PCA - if using high-dimensional data or small samples.) Notice
that substituting 1 for p in this equation gets one back to the 2K, as defined
first. The pk part of the equation represents the dimensions of linear model
coefficients; the 0.5p(p+1) part represents the dimensions of the error
covariance matrix. The reason this is important is that one might have picked
up along the way that a delta AIC of 1-2 means two models are comparable (as if
with equal likelihoods, they differ by around 1 parameter or less). This rule
of thumb would have to be augmented with highly multivariate data to 1*p to
2*p, which makes it hard to have a good general sense of when models are
comparable, unless one takes into consideration how many shape variables are in
use.
Second, the log-likelihood involves calculating the determinant of the error
covariance matrix, which is problematic for singular matrices, like might be
found with high-dimensional shape data. Recently, colleagues and I have used
plots of the log of the trace of error covariance matrices versus the log of
parameter penalties - the 2[pk + 0.5p(p+1)] part - as a way of scanning
candidate models for the one or two that have lower error relative to the
number of parameters in the model. Such an approach allows one to have no
allometric slope, a common allometric slope, and unique allometric slopes, in
combination with other important factors, and consider many models at once.
But again, there is a certain level of arbitrariness to this.
*** End pedantic discussion
There are other issues that can be quite real with real data. For example, if
one wishes to consider if there are shape differences among groups but first
wishes to address if there is meaningful allometric shape variation, and
whether there might be different allometries among groups, a homogeneity of
slopes test might be done. But what if it is revealed that one group has all
small specimens and one group has all large specimens? The researcher knows
better than anyone else whether this is sampling error or a biological
phenomenon. How to proceed should not rest solely on an outcome from a
statistical test. For example, if the specimens are adult organisms and
represent large individuals within populations, one might want to discuss shape
differences without adjusting for allometry, as well as discuss size
differences. A discussion of allometries in this case might obscure what is
really most important, that maybe two populations evolved size and shape
differences because of some ecologically meaningful reason, for example.
So I agree with you, and more. “No significance” or “significance” is only
part of the evaluation. Effect sizes and assessment of sampling errors,
biases, or limitations should also be considered. And no matter what, careful
communication that reveals the researcher’s logic needs to be made in published
articles.
Just my opinion,
Mike
On Dec 12, 2016, at 2:40 AM, andrea cardini
<[email protected]<mailto:[email protected]>> wrote:
Dear All,
if I can, I'd add a brief comment on the interpretation of non-significant
results. I'd appreciate this to be checked by those with a proper understanding
and background on stats (which I haven't!).
I use Mike's sentence on non-significant slopes as an example but the issue is
a general one, although I find it particularly tricky in the context of
comparing trajectories (allometries or other) across groups. Mike wisely said
"approximately ("If not significant, than the slope vectors are APPROXIMATELY
parallel"). With permutations, one might be able to perform tests even when
sample sizes are small (and maybe, which is even more problematic,
heterogeneous across groups): then, non-significance could simply mean that
samples are not large enough to make strong statements (rejection of the null
hp) with confidence (i.e., statistical power is low). Especially with short
trajectories (allometries or other), it might happen to find n.s. slopes with
very large angles between the vectors, a case where it is probably hard to
conclude that allometries really are parallel.
That of small samples is a curse of many studies in taxonomy and evolution.
We've done a couple of exploratory (non-very-rigorous!) empirical analyses of
the effect of reducing sample sizes on means, variances, vector angles etc. in
geometric morphometrics (Cardini & Elton, 2007, Zoomorphol.; Cardini et al.,
2015, Zoomorphol.) and some, probably, most of these, literally blow up when N
goes down. That happened even when differences were relatively large (species
separated by several millions of years of independent evolution or samples
including domestic breeds hugely different from their wild cpunterpart).
Unless one has done power analyses and/or has very large samples, I'd be
careful with the interpretations. There's plenty on this in the difficult (for
me) statistical literature. Surely one can do sophisticated power analyses in R
and, although probably and unfortunately not used by many, one of the programs
of the TPS series (TPSPower) was written by Jim exactly for this aim (possibly
not for power analyses in the case of MANCOVAs/vector angles but certainly in
the simpler case of comparisons of means).
Cheers
Andrea
On 11/12/16 19:17, Mike Collyer wrote:
Dear Tsung,
The geomorph function, advanced.procD.lm, allows one to extract group slopes
and model coefficients. In fact, procD.allometry is a specialized function
that uses advanced.procD.lm to perform the HOS test and then uses procD.lm to
produce an ANOVA table, depending on the results of the HOS test. It also uses
the coefficients and fitted values from procD.lm to generate the various types
of regression scores. In essence, procD.allometry is a function that carries
out several analyses with geomorph base functions, procD.lm and
advanced.procD.lm, in a specified way. By comparison, the output is more
limited, but one can use the base functions to get much more output.
In advanced.procD.lm, if one specifies groups and a slope, one of the outputs
is a matrix of slope vectors. Also, one can perform pairwise tests to compare
either the correlation or angle between slope vectors.
Regarding the operation of the HOS test, it is a permutational test that does
the following: calculate the sum of squared residuals for a “full” model, shape
~ size + group + size:group and the same for a “reduced” model, shape ~ size +
group. (The sum of squared residuals is the trace of the error SSCP matrix,
which is the same of the sum of the summed squared residuals for every shape
variable.) The difference between these two values is the sum of squares for
the size:group effect. If significantly large (i.e., is found with low
probability in many random permutations), one can conclude that the
coefficients for this effect are collectively large enough to justify this
effect should be retained, as the slope vectors are (at least in part) not
parallel. If not significant, than the slope vectors are approximately
parallel, and the effect can be removed from the model. A randomized residual
permutation procedure is used, which randomizes the residual vectors of the
reduced model in each random permutation to obtain random pseudo-values,
repeating the sum of squares calculations each time.
Regarding your final question, yes, you are correct. In a case like this, one
might conclude that logCS is not a significant source of shape variation, and
proceed with other analyses that do not include it as a covariate. In either
case - whether is is retained as a covariate or excluded - advanced.procD.lm
will allow one to perform pairwise comparison tests among groups.
Cheers!
Mike
On Dec 11, 2016, at 10:56 AM, Tsung Fei Khang
<[email protected]<mailto:[email protected]>> wrote:
Dear Mike,
Many thanks for the reply!
When the procD.allometry function performs HOS test with multiple group labels
given, does it compute the regression vectors for each group, and then tests
whether the coefficients of these vectors were equal, using some multivariate
statistical test? If so, is there an option that outputs the regression
vectors? Given the high frequency of the latter being discussed in the primary
GM literature, it seems important to be able to extract this result from the
function.
Finally, on the interpretation side - If group variation is significant, but
not logCS, then under the model shape~size+group, does this imply that shape
variation is mainly explained by variation in species, and allometry is absent?
Regards,
T.F.
On Thursday, December 8, 2016 at 6:08:17 PM UTC+8, Mike Collyer wrote:
Dear Tsung,
The procD.allometry function performs two basic processes when groups are
provided. First, it does a homogeneity of slopes (HOS) test. This test
ascertains whether two or more groups have parallel or unique slopes (the
latter meaning at least one groups’s slope is different than the others). The
HOS test constructs two linear models: shape ~ size + group and shape ~ size +
group + size:group, and performs an analysis of variance to determine if the
size:group interaction significantly reduces the residual error produced.
(Note: log(size) is a possible and default choice in this analysis.)
After this test, procD.allometry then provides an analysis of variance on each
term in the resulting model from the HOS test.
Regarding your question, if the HOS test reveals there is significant
heterogeneity in slopes, the coefficients returned allow one to find the unique
linear equations, by group, which would be found from separate runs on
procD.allometry, one group at a time. If the HOS test reveals that there is
not significant heterogeneity in slopes, the coefficients constrain the slopes
for different groups to be the same (parallel).
Finally, and I think more to your point, the projected regression scores are
found by using for a (in the Xa calculation you note) the coefficients that
represent a common or individual slope from the linear model produced. The
matrix of coefficients, B, is arranged as first row = intercept, second row =
common slope, next rows (if applicable) are coefficients for the group factor
(essentially change the intercept, by group), and finally, the last rows are
the coefficients for the size:group interaction (if applicable), which change
the common slope to match each group’s unique slope. Irrespective of the
complexity of this B matrix, a is found as the second row. If you run
procD.allometry group by group, it is the same as (1) asserting that group
slopes are unique and (2) changing a to match not the common slope, but the
summation of the common slope and the group-specific slope adjustment. One
could do that, but would lose the ability to compare the groups in the same
plot, as each group would be projected on a different axis.
Hope that helps.
Mike
On Dec 8, 2016, at 3:37 AM, Tsung Fei Khang
<[email protected]<mailto:[email protected]>> wrote:
Hi all,
I would like to use procD.allometry to study allometry in two species.
I understand that the function returns the regression score for each specimen
as Reg.proj, and that the calculation is obtained as:
s = Xa, where X is the nxp matrix of Procrustes shape variables, and a is the
px1 vector of regression coefficients normalized to 1. I am able to verify this
computation from first principles when all samples are presumed to come from
the same species.
However, what happens when we are interested in more than 1 species (say 2)? I
could run procD.allometry by including the species labels via f2=~gps, where
gps gives the species labels. Is there just 1 regression vector (which feels
weird, since this should be species-specific), or 2? If so, how can I recover
both vectors? What is the difference of including f2=~gps using all data,
compared to if we make two separate runs of procD.allometry, one for samples
from species 1, and another for samples from species 2?
Thanks for any help.
Rgds,
TF
" PENAFIAN: E-mel ini dan apa-apa fail yang dikepilkan bersamanya ("Mesej")
adalah ditujukan hanya untuk kegunaan penerima(-penerima) yang termaklum di
atas dan mungkin mengandungi maklumat sulit. Anda dengan ini dimaklumkan bahawa
mengambil apa jua tindakan bersandarkan kepada, membuat penilaian, mengulang
hantar, menghebah, mengedar, mencetak, atau menyalin Mesej ini atau sebahagian
daripadanya oleh sesiapa selain daripada penerima(-penerima) yang termaklum di
atas adalah dilarang. Jika anda telah menerima Mesej ini kerana kesilapan, anda
mesti menghapuskan Mesej ini dengan segera dan memaklumkan kepada penghantar
Mesej ini menerusi balasan e-mel. Pendapat-pendapat, rumusan-rumusan, dan
sebarang maklumat lain di dalam Mesej ini yang tidak berkait dengan urusan
rasmi Universiti Malaya adalah difahami sebagai bukan dikeluar atau diperakui
oleh mana-mana pihak yang disebut.
DISCLAIMER: This e-mail and any files transmitted with it ("Message") is
intended only for the use of the recipient(s) named above and may contain
confidential information. You are hereby notified that the taking of any action
in reliance upon, or any review, retransmission, dissemination, distribution,
printing or copying of this Message or any part thereof by anyone other than
the intended recipient(s) is strictly prohibited. If you have received this
Message in error, you should delete this Message immediately and advise the
sender by return e-mail. Opinions, conclusions and other information in this
Message that do not relate to the official business of University of Malaya
shall be understood as neither given nor endorsed by any of the forementioned. "
--
MORPHMET may be accessed via its webpage at
http://www.morphometrics.org<http://www.morphometrics.org/>
---
You received this message because you are subscribed to the Google Groups
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected]<http://morphometrics.org/>.
" PENAFIAN: E-mel ini dan apa-apa fail yang dikepilkan bersamanya ("Mesej")
adalah ditujukan hanya untuk kegunaan penerima(-penerima) yang termaklum di
atas dan mungkin mengandungi maklumat sulit. Anda dengan ini dimaklumkan bahawa
mengambil apa jua tindakan bersandarkan kepada, membuat penilaian, mengulang
hantar, menghebah, mengedar, mencetak, atau menyalin Mesej ini atau sebahagian
daripadanya oleh sesiapa selain daripada penerima(-penerima) yang termaklum di
atas adalah dilarang. Jika anda telah menerima Mesej ini kerana kesilapan, anda
mesti menghapuskan Mesej ini dengan segera dan memaklumkan kepada penghantar
Mesej ini menerusi balasan e-mel. Pendapat-pendapat, rumusan-rumusan, dan
sebarang maklumat lain di dalam Mesej ini yang tidak berkait dengan urusan
rasmi Universiti Malaya adalah difahami sebagai bukan dikeluar atau diperakui
oleh mana-mana pihak yang disebut.
DISCLAIMER: This e-mail and any files transmitted with it ("Message") is
intended only for the use of the recipient(s) named above and may contain
confidential information. You are hereby notified that the taking of any action
in reliance upon, or any review, retransmission, dissemination, distribution,
printing or copying of this Message or any part thereof by anyone other than
the intended recipient(s) is strictly prohibited. If you have received this
Message in error, you should delete this Message immediately and advise the
sender by return e-mail. Opinions, conclusions and other information in this
Message that do not relate to the official business of University of Malaya
shall be understood as neither given nor endorsed by any of the forementioned. "
--
MORPHMET may be accessed via its webpage at
http://www.morphometrics.org<http://www.morphometrics.org/>
---
You received this message because you are subscribed to the Google Groups
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
[email protected]<mailto:[email protected]>.
--
MORPHMET may be accessed via its webpage at
http://www.morphometrics.org<http://www.morphometrics.org/>
---
You received this message because you are subscribed to the Google Groups
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
[email protected]<mailto:[email protected]>.
--
Dr. Andrea Cardini
Researcher, Dipartimento di Scienze Chimiche e Geologiche, Università di Modena
e Reggio Emilia, Via Campi, 103 - 41125 Modena - Italy
tel. 0039 059 2058472
Adjunct Associate Professor, School of Anatomy, Physiology and Human Biology,
The University of Western Australia, 35 Stirling Highway, Crawley WA 6009,
Australia
E-mail address: [email protected]<mailto:[email protected]>,
[email protected]<mailto:[email protected]>
WEBPAGE: https://sites.google.com/site/alcardini/home/main
FREE Yellow BOOK on Geometric Morphometrics:
http://www.italian-journal-of-mammalogy.it/public/journals/3/issue_241_complete_100.pdf
ESTIMATE YOUR GLOBAL FOOTPRINT:
http://www.footprintnetwork.org/en/index.php/GFN/page/calculators/
--
MORPHMET may be accessed via its webpage at
http://www.morphometrics.org<http://www.morphometrics.org/>
---
You received this message because you are subscribed to the Google Groups
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
[email protected]<mailto:[email protected]>.
--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
You received this message because you are subscribed to the Google Groups
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
[email protected]<mailto:[email protected]>.
--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
---
You received this message because you are subscribed to the Google Groups
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].