Thanks both. I fully agree. I kind of did not want to mention the opposite case (detecting tiny effects with very large samples) not to raise too many issues at the same time. There's an example of that kind (high power in large samples) in an old paper of mine where, with a total N of more than 1000, we found significant slopes but R2 with separate lines was about 43% and with parallel was 41%, and angles of trajectories were on average relatively small.

Reporting also R2s, assuming that samples are large and those R2s are accurate, readers can judge by themselves whether an effect is really important (and not just statistically significant). Unfortunately, R2s are still missing in many papers I read or review and that concerns not only regressions but also simply pairwise comparisons of group means (pairs of taxa, sexes etc.).


Thanks everyone for your feedback!

Cheers


Andrea





On 12/12/16 15:51, Adams, Dean [EEOBS] wrote:

Andrea,

I agree that one must consider both statistical significance and biological meaningfulness in evaluating patterns. Considering one of these without the other can often get one into trouble.

Your post concerned the inability to statistically detect differences due to sample size limitations, and the possibility of concluding homogeneity from this result when it may not be the case. But as Mike mentioned, the opposite is also a concern. In fact, one might recall a discussion some months ago on Morphmet on this very issue; where large samples afforded the ability to discern allometric differences between groups, but where those statistical differences may not be biologically important. In both cases, critical thinking and a merger of statistical result and biological knowledge of the system are required to arrive at a well-reasoned understanding of the patterns in the data.

Best,

Dean

Dr. Dean C. Adams

Professor

Department of Ecology, Evolution, and Organismal Biology

       Department of Statistics

Iowa State University

www.public.iastate.edu/~dcadams/ <http://www.public.iastate.edu/%7Edcadams/>

phone: 515-294-3834

*From:*Mike Collyer [mailto:[email protected]]
*Sent:* Monday, December 12, 2016 8:34 AM
*To:* andrea cardini <[email protected]>
*Cc:* [email protected]
*Subject:* Re: brief comment on non-significance Re: [MORPHMET] procD.allometry with group inclusion

Andrea,

My opinion on this is that the researcher who has collected the data must retain at all times a biological wisdom that supersedes a suggested course of action based on results from a statistical test. If the purpose of a study is to assess the allometric pattern of shape variation within populations, then maybe the results of a homogeneity of slopes test can be an unnecessary burden. If a researcher wants to compare the mean shapes of different groups but is concerned that allometric variation might differ among groups, then a homogeneity of slopes test could be an important first step, but I agree that a non-significant result should not spur the researcher to immediately conclude a common allometry or no allometry is appropriate. Sample size, variation in size among groups, and appropriate distributions of specimen size within groups might all be things to think about.

The point you make about a potential type II error is a real concern. The opposite problem is also a real concern. One might have very large sample sizes and sufficient statistical power to suggest that allometric slopes are heterogeneous. However, the coefficient of determination and/or effect size for size:group interaction might be quite small. Just because there is a low probability of finding as large of an effect based on thousands of random permutations, is one ready to accept that different groups have evolved unique allometric trajectories? It is easy to forget that the choice of “significance level” - the a priori acceptable rate of type I error - is arbitrary. Making strong inferential decisions based on a binary decision for an arbitrary criterion is probably not wise. I would argue that instead of focusing on a P-value, one could just as arbitrarily, but perhaps more justifiably, choose a coefficient of determination of R^2 = 0.10 or an effect size of 2 SD as a criterion for whether to retain or omit the interaction coefficients that allow for heterogenous slopes.

*** Warning: pedantic discussion on model selection starts here. Skip if unappealing.

One could also turn to model selection approaches. However, I think multivariate generalization for indices like AIC is an area lacking needed theoretical research for high-dimensional shape data. There are two reasons for this. First, the oft-defined AIC is model log-likelihood + 2K, where K is the number of coefficients in a linear model (rank of the model design matrix) + 1, where the 1 is the dimension of the value for the variance of the error. This is a simplification for univariate data. The second half of the equation is actually 2[pk + 0.5p(p+1)], where p is the number of shape variables and k is the rank of the design matrix. (One might define p as the rank of the shape variable matrix - the number of actual dimensions in the tangent space, also equal to the number of principal components with positive eigen values greater than 0 from a PCA - if using high-dimensional data or small samples.) Notice that substituting 1 for p in this equation gets one back to the 2K, as defined first. The pk part of the equation represents the dimensions of linear model coefficients; the 0.5p(p+1) part represents the dimensions of the error covariance matrix. The reason this is important is that one might have picked up along the way that a delta AIC of 1-2 means two models are comparable (as if with equal likelihoods, they differ by around 1 parameter or less). This rule of thumb would have to be augmented with highly multivariate data to 1*p to 2*p, which makes it hard to have a good general sense of when models are comparable, unless one takes into consideration how many shape variables are in use.

Second, the log-likelihood involves calculating the determinant of the error covariance matrix, which is problematic for singular matrices, like might be found with high-dimensional shape data. Recently, colleagues and I have used plots of the log of the trace of error covariance matrices versus the log of parameter penalties - the 2[pk + 0.5p(p+1)] part - as a way of scanning candidate models for the one or two that have lower error relative to the number of parameters in the model. Such an approach allows one to have no allometric slope, a common allometric slope, and unique allometric slopes, in combination with other important factors, and consider many models at once. But again, there is a certain level of arbitrariness to this.

*** End pedantic discussion

There are other issues that can be quite real with real data. For example, if one wishes to consider if there are shape differences among groups but first wishes to address if there is meaningful allometric shape variation, and whether there might be different allometries among groups, a homogeneity of slopes test might be done. But what if it is revealed that one group has all small specimens and one group has all large specimens? The researcher knows better than anyone else whether this is sampling error or a biological phenomenon. How to proceed should not rest solely on an outcome from a statistical test. For example, if the specimens are adult organisms and represent large individuals within populations, one might want to discuss shape differences without adjusting for allometry, as well as discuss size differences. A discussion of allometries in this case might obscure what is really most important, that maybe two populations evolved size and shape differences because of some ecologically meaningful reason, for example.

So I agree with you, and more. “No significance” or “significance” is only part of the evaluation. Effect sizes and assessment of sampling errors, biases, or limitations should also be considered. And no matter what, careful communication that reveals the researcher’s logic needs to be made in published articles.

Just my opinion,

Mike

    On Dec 12, 2016, at 2:40 AM, andrea cardini <[email protected]
    <mailto:[email protected]>> wrote:

    Dear All,

    if I can, I'd add a brief comment on the interpretation of
    non-significant results. I'd appreciate this to be checked by
    those with a proper understanding and background on stats (which I
    haven't!).

    I use Mike's sentence on non-significant slopes as an example but
    the issue is a general one, although I find it particularly tricky
    in the context of comparing trajectories (allometries or other)
    across groups. Mike wisely said "approximately ("If not
    significant, than the slope vectors are APPROXIMATELY parallel").
    With permutations, one might be able to perform tests even when
    sample sizes are small (and maybe, which is even more problematic,
    heterogeneous across groups): then, non-significance could simply
    mean that samples are not large enough to make strong statements
    (rejection of the null hp) with confidence (i.e., statistical
    power is low). Especially with short trajectories (allometries or
    other), it might happen to find n.s. slopes with very large angles
    between the vectors, a case where it is probably hard to conclude
    that allometries really are parallel.

    That of small samples is a curse of many studies in taxonomy and
    evolution. We've done a couple of exploratory (non-very-rigorous!)
    empirical analyses of the effect of reducing sample sizes on
    means, variances, vector angles etc. in geometric morphometrics
    (Cardini & Elton, 2007, Zoomorphol.; Cardini et al., 2015,
    Zoomorphol.) and some, probably, most of these, literally blow up
    when N goes down. That happened even when differences were
    relatively large (species separated by several millions of years
    of independent evolution or samples including domestic breeds
    hugely different from their wild cpunterpart).

    Unless one has done power analyses and/or has very large samples,
    I'd be careful with the interpretations. There's plenty on this in
    the difficult (for me) statistical literature. Surely one can do
    sophisticated power analyses in R and, although probably and
    unfortunately not used by many, one of the programs of the TPS
    series (TPSPower) was written by Jim exactly for this aim
    (possibly not for power analyses in the case of MANCOVAs/vector
    angles but certainly in the simpler case of comparisons of means).

    Cheers

    Andrea

    On 11/12/16 19:17, Mike Collyer wrote:

        Dear Tsung,

        The geomorph function, advanced.procD.lm, allows one to
        extract group slopes and model coefficients.  In fact,
        procD.allometry is a specialized function that uses
        advanced.procD.lm to perform the HOS test and then uses
        procD.lm to produce an ANOVA table, depending on the results
        of the HOS test.  It also uses the coefficients and fitted
        values from procD.lm to generate the various types of
        regression scores.  In essence, procD.allometry is a function
        that carries out several analyses with geomorph base
        functions, procD.lm and advanced.procD.lm, in a specified way.
         By comparison, the output is more limited, but one can use
        the base functions to get much more output.

        In advanced.procD.lm, if one specifies groups and a slope, one
        of the outputs is a matrix of slope vectors.  Also, one can
        perform pairwise tests to compare either the correlation or
        angle between slope vectors.

        Regarding the operation of the HOS test, it is a permutational
        test that does the following: calculate the sum of squared
        residuals for a “full” model, shape ~ size + group +
        size:group and the same for a “reduced” model, shape ~ size +
        group.  (The sum of squared residuals is the trace of the
        error SSCP matrix, which is the same of the sum of the summed
        squared residuals for every shape variable.)    The difference
        between these two values is the sum of squares for the
        size:group effect.  If significantly large (i.e., is found
        with low probability in many random permutations), one can
        conclude that the coefficients for this effect are
        collectively large enough to justify this effect should be
        retained, as the slope vectors are (at least in part) not
        parallel.  If not significant, than the slope vectors are
        approximately parallel, and the effect can be removed from the
        model.  A randomized residual permutation procedure is used,
        which randomizes the residual vectors of the reduced model in
        each random permutation to obtain random pseudo-values,
        repeating the sum of squares calculations each time.

        Regarding your final question, yes, you are correct.  In a
        case like this, one might conclude that logCS is not a
        significant source of shape variation, and proceed with other
        analyses that do not include it as a covariate.  In either
        case - whether is is retained as a covariate or excluded -
        advanced.procD.lm will allow one to perform pairwise
        comparison tests among groups.

        Cheers!

        Mike

            On Dec 11, 2016, at 10:56 AM, Tsung Fei Khang
            <[email protected] <mailto:[email protected]>> wrote:

            Dear Mike,

            Many thanks for the reply!

            When the procD.allometry function performs HOS test with
            multiple group labels given, does it compute the
            regression vectors for each group, and then tests whether
            the coefficients of these vectors were equal, using some
            multivariate statistical test? If so, is there an option
            that outputs the regression vectors? Given the high
            frequency of the latter being discussed in the primary GM
            literature, it seems important to be able to extract this
            result from the function.

            Finally, on the interpretation side - If group variation
            is significant, but not logCS, then under the model
            shape~size+group, does this imply that shape variation is
            mainly explained by variation in species, and allometry is
            absent?

            Regards,

            T.F.


            On Thursday, December 8, 2016 at 6:08:17 PM UTC+8, Mike
            Collyer wrote:

                Dear Tsung,

                The procD.allometry function performs two basic
                processes when groups are provided.  First, it does a
                homogeneity of slopes (HOS) test.  This test
                ascertains whether two or more groups have parallel or
                unique slopes (the latter meaning at least one
                groups’s slope is different than the others).  The HOS
                test constructs two linear models: shape ~ size +
                group and shape ~ size + group + size:group, and
                performs an analysis of variance to determine if the
                size:group interaction significantly reduces the
                residual error produced.  (Note: log(size) is a
                possible and default choice in this analysis.)

                After this test, procD.allometry then provides an
                analysis of variance on each term in the resulting
                model from the HOS test.

                Regarding your question, if the HOS test reveals there
                is significant heterogeneity in slopes, the
                coefficients returned allow one to find the unique
                linear equations, by group, which would be found from
                separate runs on procD.allometry, one group at a time.
                 If the HOS test reveals that there is not significant
                heterogeneity in slopes, the coefficients constrain
                the slopes for different groups to be the same
                (parallel).

                Finally, and I think more to your point, the projected
                regression scores are found by using for a (in the Xa
                calculation you note) the coefficients that represent
                a common or individual slope from the linear model
                produced.  The matrix of coefficients, B, is arranged
                as first row = intercept, second row = common slope,
                next rows (if applicable) are coefficients for the
                group factor (essentially change the intercept, by
                group), and finally, the last rows are the
                coefficients for the size:group interaction (if
                applicable), which change the common slope to match
                each group’s unique slope.  Irrespective of the
                complexity of this B matrix, a is found as the second
                row.  If you run procD.allometry group by group, it is
                the same as (1) asserting that group slopes are unique
                and (2) changing a to match not the common slope, but
                the summation of the common slope and the
                group-specific slope adjustment.  One could do that,
                but would lose the ability to compare the groups in
                the same plot, as each group would be projected on a
                different axis.

                Hope that helps.

                Mike

                    On Dec 8, 2016, at 3:37 AM, Tsung Fei Khang
                    <[email protected] <mailto:[email protected]>> wrote:

                    Hi all,

                    I would like to use procD.allometry to study
                    allometry in two species.

                    I understand that the function returns the
                    regression score for each specimen as Reg.proj,
                    and that the calculation is obtained as:

                    s = Xa, where X is the nxp matrix of Procrustes
                    shape variables, and a is the px1 vector of
                    regression coefficients normalized to 1. I am able
                    to verify this computation from first principles
                    when all samples are presumed to come from the
                    same species.

                    However, what happens when we are interested in
                    more than 1 species (say 2)? I could run
                    procD.allometry by including the species labels
                    via f2=~gps, where gps gives the species labels.
                    Is there just 1 regression vector (which feels
                    weird, since this should be species-specific), or
                    2? If so, how can I recover both vectors? What is
                    the difference of including f2=~gps using all
                    data, compared to if we make two separate runs of
                    procD.allometry, one for samples from species 1,
                    and another for samples from species 2?

                    Thanks for any help.

                    Rgds,

                    TF

                    " PENAFIAN: E-mel ini dan apa-apa fail yang
                    dikepilkan bersamanya ("Mesej") adalah ditujukan
                    hanya untuk kegunaan penerima(-penerima) yang
                    termaklum di atas dan mungkin mengandungi maklumat
                    sulit. Anda dengan ini dimaklumkan bahawa
                    mengambil apa jua tindakan bersandarkan kepada,
                    membuat penilaian, mengulang hantar, menghebah,
                    mengedar, mencetak, atau menyalin Mesej ini atau
                    sebahagian daripadanya oleh sesiapa selain
                    daripada penerima(-penerima) yang termaklum di
                    atas adalah dilarang. Jika anda telah menerima
                    Mesej ini kerana kesilapan, anda mesti
                    menghapuskan Mesej ini dengan segera dan
                    memaklumkan kepada penghantar Mesej ini menerusi
                    balasan e-mel. Pendapat-pendapat, rumusan-rumusan,
                    dan sebarang maklumat lain di dalam Mesej ini yang
                    tidak berkait dengan urusan rasmi Universiti
                    Malaya adalah difahami sebagai bukan dikeluar atau
                    diperakui oleh mana-mana pihak yang disebut.

                    DISCLAIMER: This e-mail and any files transmitted
                    with it ("Message") is intended only for the use
                    of the recipient(s) named above and may contain
                    confidential information. You are hereby notified
                    that the taking of any action in reliance upon, or
                    any review, retransmission, dissemination,
                    distribution, printing or copying of this Message
                    or any part thereof by anyone other than the
                    intended recipient(s) is strictly prohibited. If
                    you have received this Message in error, you
                    should delete this Message immediately and advise
                    the sender by return e-mail. Opinions, conclusions
                    and other information in this Message that do not
                    relate to the official business of University of
                    Malaya shall be understood as neither given nor
                    endorsed by any of the forementioned. "

                    --
                    MORPHMET may be accessed via its webpage
                    athttp://www.morphometrics.org
                    <http://www.morphometrics.org/>
                    ---
                    You received this message because you are
                    subscribed to the Google Groups "MORPHMET" group.
                    To unsubscribe from this group and stop receiving
                    emails from it, send an email
                    [email protected]
                    <http://morphometrics.org/>.

            " PENAFIAN: E-mel ini dan apa-apa fail yang dikepilkan
            bersamanya ("Mesej") adalah ditujukan hanya untuk kegunaan
            penerima(-penerima) yang termaklum di atas dan mungkin
            mengandungi maklumat sulit. Anda dengan ini dimaklumkan
            bahawa mengambil apa jua tindakan bersandarkan kepada,
            membuat penilaian, mengulang hantar, menghebah, mengedar,
            mencetak, atau menyalin Mesej ini atau sebahagian
            daripadanya oleh sesiapa selain daripada
            penerima(-penerima) yang termaklum di atas adalah
            dilarang. Jika anda telah menerima Mesej ini kerana
            kesilapan, anda mesti menghapuskan Mesej ini dengan segera
            dan memaklumkan kepada penghantar Mesej ini menerusi
            balasan e-mel. Pendapat-pendapat, rumusan-rumusan, dan
            sebarang maklumat lain di dalam Mesej ini yang tidak
            berkait dengan urusan rasmi Universiti Malaya adalah
            difahami sebagai bukan dikeluar atau diperakui oleh
            mana-mana pihak yang disebut.

            DISCLAIMER: This e-mail and any files transmitted with it
            ("Message") is intended only for the use of the
            recipient(s) named above and may contain confidential
            information. You are hereby notified that the taking of
            any action in reliance upon, or any review,
            retransmission, dissemination, distribution, printing or
            copying of this Message or any part thereof by anyone
            other than the intended recipient(s) is strictly
            prohibited. If you have received this Message in error,
            you should delete this Message immediately and advise the
            sender by return e-mail. Opinions, conclusions and other
            information in this Message that do not relate to the
            official business of University of Malaya shall be
            understood as neither given nor endorsed by any of the
            forementioned. "

            --
            MORPHMET may be accessed via its webpage
            athttp://www.morphometrics.org <http://www.morphometrics.org/>
            ---
            You received this message because you are subscribed to
            the Google Groups "MORPHMET" group.
            To unsubscribe from this group and stop receiving emails
            from it, send an email
            [email protected]
            <mailto:[email protected]>.

-- MORPHMET may be accessed via its webpage at
        http://www.morphometrics.org <http://www.morphometrics.org/>
        ---
        You received this message because you are subscribed to the
        Google Groups "MORPHMET" group.
        To unsubscribe from this group and stop receiving emails from
        it, send an email to [email protected]
        <mailto:[email protected]>.



--
    Dr. Andrea Cardini

    Researcher, Dipartimento di Scienze Chimiche e Geologiche, Università di 
Modena e Reggio Emilia, Via Campi, 103 - 41125 Modena - Italy

    tel. 0039 059 2058472

    Adjunct Associate Professor, School of Anatomy, Physiology and Human 
Biology, The University of Western Australia, 35 Stirling Highway, Crawley WA 
6009, Australia

    E-mail address:[email protected] 
<mailto:[email protected]>,[email protected] 
<mailto:[email protected]>

    WEBPAGE:https://sites.google.com/site/alcardini/home/main

    FREE Yellow BOOK on Geometric 
Morphometrics:http://www.italian-journal-of-mammalogy.it/public/journals/3/issue_241_complete_100.pdf

    ESTIMATE YOUR GLOBAL 
FOOTPRINT:http://www.footprintnetwork.org/en/index.php/GFN/page/calculators/

-- MORPHMET may be accessed via its webpage at
    http://www.morphometrics.org <http://www.morphometrics.org/>
    ---
    You received this message because you are subscribed to the Google
    Groups "MORPHMET" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected]
    <mailto:[email protected]>.

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org <http://www.morphometrics.org>
---
You received this message because you are subscribed to the Google Groups "MORPHMET" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>.


--

Dr. Andrea Cardini
Researcher, Dipartimento di Scienze Chimiche e Geologiche, Università di Modena 
e Reggio Emilia, Via Campi, 103 - 41125 Modena - Italy
tel. 0039 059 2058472

Adjunct Associate Professor, School of Anatomy, Physiology and Human Biology, 
The University of Western Australia, 35 Stirling Highway, Crawley WA 6009, 
Australia

E-mail address: [email protected], [email protected]
WEBPAGE: https://sites.google.com/site/alcardini/home/main

FREE Yellow BOOK on Geometric Morphometrics: 
http://www.italian-journal-of-mammalogy.it/public/journals/3/issue_241_complete_100.pdf

ESTIMATE YOUR GLOBAL FOOTPRINT: 
http://www.footprintnetwork.org/en/index.php/GFN/page/calculators/

--
MORPHMET may be accessed via its webpage at http://www.morphometrics.org
--- You received this message because you are subscribed to the Google Groups "MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].

Reply via email to