Andrea,
I agree that one must consider both statistical significance and
biological meaningfulness in evaluating patterns. Considering one of
these without the other can often get one into trouble.
Your post concerned the inability to statistically detect differences
due to sample size limitations, and the possibility of concluding
homogeneity from this result when it may not be the case. But as Mike
mentioned, the opposite is also a concern. In fact, one might recall a
discussion some months ago on Morphmet on this very issue; where large
samples afforded the ability to discern allometric differences between
groups, but where those statistical differences may not be
biologically important. In both cases, critical thinking and a merger
of statistical result and biological knowledge of the system are
required to arrive at a well-reasoned understanding of the patterns in
the data.
Best,
Dean
Dr. Dean C. Adams
Professor
Department of Ecology, Evolution, and Organismal Biology
Department of Statistics
Iowa State University
www.public.iastate.edu/~dcadams/
<http://www.public.iastate.edu/%7Edcadams/>
phone: 515-294-3834
*From:*Mike Collyer [mailto:[email protected]]
*Sent:* Monday, December 12, 2016 8:34 AM
*To:* andrea cardini <[email protected]>
*Cc:* [email protected]
*Subject:* Re: brief comment on non-significance Re: [MORPHMET]
procD.allometry with group inclusion
Andrea,
My opinion on this is that the researcher who has collected the data
must retain at all times a biological wisdom that supersedes a
suggested course of action based on results from a statistical test.
If the purpose of a study is to assess the allometric pattern of
shape variation within populations, then maybe the results of a
homogeneity of slopes test can be an unnecessary burden. If a
researcher wants to compare the mean shapes of different groups but is
concerned that allometric variation might differ among groups, then a
homogeneity of slopes test could be an important first step, but I
agree that a non-significant result should not spur the researcher to
immediately conclude a common allometry or no allometry is
appropriate. Sample size, variation in size among groups, and
appropriate distributions of specimen size within groups might all be
things to think about.
The point you make about a potential type II error is a real concern.
The opposite problem is also a real concern. One might have very
large sample sizes and sufficient statistical power to suggest that
allometric slopes are heterogeneous. However, the coefficient of
determination and/or effect size for size:group interaction might be
quite small. Just because there is a low probability of finding as
large of an effect based on thousands of random permutations, is one
ready to accept that different groups have evolved unique allometric
trajectories? It is easy to forget that the choice of “significance
level” - the a priori acceptable rate of type I error - is arbitrary.
Making strong inferential decisions based on a binary decision for an
arbitrary criterion is probably not wise. I would argue that instead
of focusing on a P-value, one could just as arbitrarily, but perhaps
more justifiably, choose a coefficient of determination of R^2 = 0.10
or an effect size of 2 SD as a criterion for whether to retain or omit
the interaction coefficients that allow for heterogenous slopes.
*** Warning: pedantic discussion on model selection starts here. Skip
if unappealing.
One could also turn to model selection approaches. However, I think
multivariate generalization for indices like AIC is an area lacking
needed theoretical research for high-dimensional shape data. There
are two reasons for this. First, the oft-defined AIC is model
log-likelihood + 2K, where K is the number of coefficients in a linear
model (rank of the model design matrix) + 1, where the 1 is the
dimension of the value for the variance of the error. This is a
simplification for univariate data. The second half of the equation
is actually 2[pk + 0.5p(p+1)], where p is the number of shape
variables and k is the rank of the design matrix. (One might define p
as the rank of the shape variable matrix - the number of actual
dimensions in the tangent space, also equal to the number of principal
components with positive eigen values greater than 0 from a PCA - if
using high-dimensional data or small samples.) Notice that
substituting 1 for p in this equation gets one back to the 2K, as
defined first. The pk part of the equation represents the dimensions
of linear model coefficients; the 0.5p(p+1) part represents the
dimensions of the error covariance matrix. The reason this is
important is that one might have picked up along the way that a delta
AIC of 1-2 means two models are comparable (as if with equal
likelihoods, they differ by around 1 parameter or less). This rule of
thumb would have to be augmented with highly multivariate data to 1*p
to 2*p, which makes it hard to have a good general sense of when
models are comparable, unless one takes into consideration how many
shape variables are in use.
Second, the log-likelihood involves calculating the determinant of the
error covariance matrix, which is problematic for singular matrices,
like might be found with high-dimensional shape data. Recently,
colleagues and I have used plots of the log of the trace of error
covariance matrices versus the log of parameter penalties - the 2[pk +
0.5p(p+1)] part - as a way of scanning candidate models for the one or
two that have lower error relative to the number of parameters in the
model. Such an approach allows one to have no allometric slope, a
common allometric slope, and unique allometric slopes, in combination
with other important factors, and consider many models at once. But
again, there is a certain level of arbitrariness to this.
*** End pedantic discussion
There are other issues that can be quite real with real data. For
example, if one wishes to consider if there are shape differences
among groups but first wishes to address if there is meaningful
allometric shape variation, and whether there might be different
allometries among groups, a homogeneity of slopes test might be done.
But what if it is revealed that one group has all small specimens and
one group has all large specimens? The researcher knows better than
anyone else whether this is sampling error or a biological phenomenon.
How to proceed should not rest solely on an outcome from a
statistical test. For example, if the specimens are adult organisms
and represent large individuals within populations, one might want to
discuss shape differences without adjusting for allometry, as well as
discuss size differences. A discussion of allometries in this case
might obscure what is really most important, that maybe two
populations evolved size and shape differences because of some
ecologically meaningful reason, for example.
So I agree with you, and more. “No significance” or “significance” is
only part of the evaluation. Effect sizes and assessment of sampling
errors, biases, or limitations should also be considered. And no
matter what, careful communication that reveals the researcher’s logic
needs to be made in published articles.
Just my opinion,
Mike
On Dec 12, 2016, at 2:40 AM, andrea cardini <[email protected]
<mailto:[email protected]>> wrote:
Dear All,
if I can, I'd add a brief comment on the interpretation of
non-significant results. I'd appreciate this to be checked by
those with a proper understanding and background on stats (which I
haven't!).
I use Mike's sentence on non-significant slopes as an example but
the issue is a general one, although I find it particularly tricky
in the context of comparing trajectories (allometries or other)
across groups. Mike wisely said "approximately ("If not
significant, than the slope vectors are APPROXIMATELY parallel").
With permutations, one might be able to perform tests even when
sample sizes are small (and maybe, which is even more problematic,
heterogeneous across groups): then, non-significance could simply
mean that samples are not large enough to make strong statements
(rejection of the null hp) with confidence (i.e., statistical
power is low). Especially with short trajectories (allometries or
other), it might happen to find n.s. slopes with very large angles
between the vectors, a case where it is probably hard to conclude
that allometries really are parallel.
That of small samples is a curse of many studies in taxonomy and
evolution. We've done a couple of exploratory (non-very-rigorous!)
empirical analyses of the effect of reducing sample sizes on
means, variances, vector angles etc. in geometric morphometrics
(Cardini & Elton, 2007, Zoomorphol.; Cardini et al., 2015,
Zoomorphol.) and some, probably, most of these, literally blow up
when N goes down. That happened even when differences were
relatively large (species separated by several millions of years
of independent evolution or samples including domestic breeds
hugely different from their wild cpunterpart).
Unless one has done power analyses and/or has very large samples,
I'd be careful with the interpretations. There's plenty on this in
the difficult (for me) statistical literature. Surely one can do
sophisticated power analyses in R and, although probably and
unfortunately not used by many, one of the programs of the TPS
series (TPSPower) was written by Jim exactly for this aim
(possibly not for power analyses in the case of MANCOVAs/vector
angles but certainly in the simpler case of comparisons of means).
Cheers
Andrea
On 11/12/16 19:17, Mike Collyer wrote:
Dear Tsung,
The geomorph function, advanced.procD.lm, allows one to
extract group slopes and model coefficients. In fact,
procD.allometry is a specialized function that uses
advanced.procD.lm to perform the HOS test and then uses
procD.lm to produce an ANOVA table, depending on the results
of the HOS test. It also uses the coefficients and fitted
values from procD.lm to generate the various types of
regression scores. In essence, procD.allometry is a function
that carries out several analyses with geomorph base
functions, procD.lm and advanced.procD.lm, in a specified way.
By comparison, the output is more limited, but one can use
the base functions to get much more output.
In advanced.procD.lm, if one specifies groups and a slope, one
of the outputs is a matrix of slope vectors. Also, one can
perform pairwise tests to compare either the correlation or
angle between slope vectors.
Regarding the operation of the HOS test, it is a permutational
test that does the following: calculate the sum of squared
residuals for a “full” model, shape ~ size + group +
size:group and the same for a “reduced” model, shape ~ size +
group. (The sum of squared residuals is the trace of the
error SSCP matrix, which is the same of the sum of the summed
squared residuals for every shape variable.) The difference
between these two values is the sum of squares for the
size:group effect. If significantly large (i.e., is found
with low probability in many random permutations), one can
conclude that the coefficients for this effect are
collectively large enough to justify this effect should be
retained, as the slope vectors are (at least in part) not
parallel. If not significant, than the slope vectors are
approximately parallel, and the effect can be removed from the
model. A randomized residual permutation procedure is used,
which randomizes the residual vectors of the reduced model in
each random permutation to obtain random pseudo-values,
repeating the sum of squares calculations each time.
Regarding your final question, yes, you are correct. In a
case like this, one might conclude that logCS is not a
significant source of shape variation, and proceed with other
analyses that do not include it as a covariate. In either
case - whether is is retained as a covariate or excluded -
advanced.procD.lm will allow one to perform pairwise
comparison tests among groups.
Cheers!
Mike
On Dec 11, 2016, at 10:56 AM, Tsung Fei Khang
<[email protected] <mailto:[email protected]>> wrote:
Dear Mike,
Many thanks for the reply!
When the procD.allometry function performs HOS test with
multiple group labels given, does it compute the
regression vectors for each group, and then tests whether
the coefficients of these vectors were equal, using some
multivariate statistical test? If so, is there an option
that outputs the regression vectors? Given the high
frequency of the latter being discussed in the primary GM
literature, it seems important to be able to extract this
result from the function.
Finally, on the interpretation side - If group variation
is significant, but not logCS, then under the model
shape~size+group, does this imply that shape variation is
mainly explained by variation in species, and allometry is
absent?
Regards,
T.F.
On Thursday, December 8, 2016 at 6:08:17 PM UTC+8, Mike
Collyer wrote:
Dear Tsung,
The procD.allometry function performs two basic
processes when groups are provided. First, it does a
homogeneity of slopes (HOS) test. This test
ascertains whether two or more groups have parallel or
unique slopes (the latter meaning at least one
groups’s slope is different than the others). The HOS
test constructs two linear models: shape ~ size +
group and shape ~ size + group + size:group, and
performs an analysis of variance to determine if the
size:group interaction significantly reduces the
residual error produced. (Note: log(size) is a
possible and default choice in this analysis.)
After this test, procD.allometry then provides an
analysis of variance on each term in the resulting
model from the HOS test.
Regarding your question, if the HOS test reveals there
is significant heterogeneity in slopes, the
coefficients returned allow one to find the unique
linear equations, by group, which would be found from
separate runs on procD.allometry, one group at a time.
If the HOS test reveals that there is not significant
heterogeneity in slopes, the coefficients constrain
the slopes for different groups to be the same
(parallel).
Finally, and I think more to your point, the projected
regression scores are found by using for a (in the Xa
calculation you note) the coefficients that represent
a common or individual slope from the linear model
produced. The matrix of coefficients, B, is arranged
as first row = intercept, second row = common slope,
next rows (if applicable) are coefficients for the
group factor (essentially change the intercept, by
group), and finally, the last rows are the
coefficients for the size:group interaction (if
applicable), which change the common slope to match
each group’s unique slope. Irrespective of the
complexity of this B matrix, a is found as the second
row. If you run procD.allometry group by group, it is
the same as (1) asserting that group slopes are unique
and (2) changing a to match not the common slope, but
the summation of the common slope and the
group-specific slope adjustment. One could do that,
but would lose the ability to compare the groups in
the same plot, as each group would be projected on a
different axis.
Hope that helps.
Mike
On Dec 8, 2016, at 3:37 AM, Tsung Fei Khang
<[email protected] <mailto:[email protected]>> wrote:
Hi all,
I would like to use procD.allometry to study
allometry in two species.
I understand that the function returns the
regression score for each specimen as Reg.proj,
and that the calculation is obtained as:
s = Xa, where X is the nxp matrix of Procrustes
shape variables, and a is the px1 vector of
regression coefficients normalized to 1. I am able
to verify this computation from first principles
when all samples are presumed to come from the
same species.
However, what happens when we are interested in
more than 1 species (say 2)? I could run
procD.allometry by including the species labels
via f2=~gps, where gps gives the species labels.
Is there just 1 regression vector (which feels
weird, since this should be species-specific), or
2? If so, how can I recover both vectors? What is
the difference of including f2=~gps using all
data, compared to if we make two separate runs of
procD.allometry, one for samples from species 1,
and another for samples from species 2?
Thanks for any help.
Rgds,
TF
" PENAFIAN: E-mel ini dan apa-apa fail yang
dikepilkan bersamanya ("Mesej") adalah ditujukan
hanya untuk kegunaan penerima(-penerima) yang
termaklum di atas dan mungkin mengandungi maklumat
sulit. Anda dengan ini dimaklumkan bahawa
mengambil apa jua tindakan bersandarkan kepada,
membuat penilaian, mengulang hantar, menghebah,
mengedar, mencetak, atau menyalin Mesej ini atau
sebahagian daripadanya oleh sesiapa selain
daripada penerima(-penerima) yang termaklum di
atas adalah dilarang. Jika anda telah menerima
Mesej ini kerana kesilapan, anda mesti
menghapuskan Mesej ini dengan segera dan
memaklumkan kepada penghantar Mesej ini menerusi
balasan e-mel. Pendapat-pendapat, rumusan-rumusan,
dan sebarang maklumat lain di dalam Mesej ini yang
tidak berkait dengan urusan rasmi Universiti
Malaya adalah difahami sebagai bukan dikeluar atau
diperakui oleh mana-mana pihak yang disebut.
DISCLAIMER: This e-mail and any files transmitted
with it ("Message") is intended only for the use
of the recipient(s) named above and may contain
confidential information. You are hereby notified
that the taking of any action in reliance upon, or
any review, retransmission, dissemination,
distribution, printing or copying of this Message
or any part thereof by anyone other than the
intended recipient(s) is strictly prohibited. If
you have received this Message in error, you
should delete this Message immediately and advise
the sender by return e-mail. Opinions, conclusions
and other information in this Message that do not
relate to the official business of University of
Malaya shall be understood as neither given nor
endorsed by any of the forementioned. "
--
MORPHMET may be accessed via its webpage
athttp://www.morphometrics.org
<http://www.morphometrics.org/>
---
You received this message because you are
subscribed to the Google Groups "MORPHMET" group.
To unsubscribe from this group and stop receiving
emails from it, send an email
[email protected]
<http://morphometrics.org/>.
" PENAFIAN: E-mel ini dan apa-apa fail yang dikepilkan
bersamanya ("Mesej") adalah ditujukan hanya untuk kegunaan
penerima(-penerima) yang termaklum di atas dan mungkin
mengandungi maklumat sulit. Anda dengan ini dimaklumkan
bahawa mengambil apa jua tindakan bersandarkan kepada,
membuat penilaian, mengulang hantar, menghebah, mengedar,
mencetak, atau menyalin Mesej ini atau sebahagian
daripadanya oleh sesiapa selain daripada
penerima(-penerima) yang termaklum di atas adalah
dilarang. Jika anda telah menerima Mesej ini kerana
kesilapan, anda mesti menghapuskan Mesej ini dengan segera
dan memaklumkan kepada penghantar Mesej ini menerusi
balasan e-mel. Pendapat-pendapat, rumusan-rumusan, dan
sebarang maklumat lain di dalam Mesej ini yang tidak
berkait dengan urusan rasmi Universiti Malaya adalah
difahami sebagai bukan dikeluar atau diperakui oleh
mana-mana pihak yang disebut.
DISCLAIMER: This e-mail and any files transmitted with it
("Message") is intended only for the use of the
recipient(s) named above and may contain confidential
information. You are hereby notified that the taking of
any action in reliance upon, or any review,
retransmission, dissemination, distribution, printing or
copying of this Message or any part thereof by anyone
other than the intended recipient(s) is strictly
prohibited. If you have received this Message in error,
you should delete this Message immediately and advise the
sender by return e-mail. Opinions, conclusions and other
information in this Message that do not relate to the
official business of University of Malaya shall be
understood as neither given nor endorsed by any of the
forementioned. "
--
MORPHMET may be accessed via its webpage
athttp://www.morphometrics.org <http://www.morphometrics.org/>
---
You received this message because you are subscribed to
the Google Groups "MORPHMET" group.
To unsubscribe from this group and stop receiving emails
from it, send an email
[email protected]
<mailto:[email protected]>.
--
MORPHMET may be accessed via its webpage at
http://www.morphometrics.org <http://www.morphometrics.org/>
---
You received this message because you are subscribed to the
Google Groups "MORPHMET" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to [email protected]
<mailto:[email protected]>.
--
Dr. Andrea Cardini
Researcher, Dipartimento di Scienze Chimiche e Geologiche, Università di
Modena e Reggio Emilia, Via Campi, 103 - 41125 Modena - Italy
tel. 0039 059 2058472
Adjunct Associate Professor, School of Anatomy, Physiology and Human
Biology, The University of Western Australia, 35 Stirling Highway, Crawley WA
6009, Australia
E-mail address:[email protected]
<mailto:[email protected]>,[email protected]
<mailto:[email protected]>
WEBPAGE:https://sites.google.com/site/alcardini/home/main
FREE Yellow BOOK on Geometric
Morphometrics:http://www.italian-journal-of-mammalogy.it/public/journals/3/issue_241_complete_100.pdf
ESTIMATE YOUR GLOBAL
FOOTPRINT:http://www.footprintnetwork.org/en/index.php/GFN/page/calculators/
--
MORPHMET may be accessed via its webpage at
http://www.morphometrics.org <http://www.morphometrics.org/>
---
You received this message because you are subscribed to the Google
Groups "MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected]
<mailto:[email protected]>.
--
MORPHMET may be accessed via its webpage at
http://www.morphometrics.org <http://www.morphometrics.org>
---
You received this message because you are subscribed to the Google
Groups "MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.