Good advice!

Cheers,
Ted

From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on 
behalf of Luke Matthews [lmatth...@activatenetworks.net]
Sent: Tuesday, December 03, 2013 8:29 AM
To: r-sig-phylo@r-project.org
Subject: Re: [R-sig-phylo] best fit vs normality of residuals

Hi Agus,
If I understand your post correctly, you implemented the two models with 
exactly the same formula, and phylogenetic tree, and varied only the transform 
applied to the variables.  In one case the transform was to 'scale and center' 
both the dependent and independent variables while in the other case you rank 
ordered the all variables.  Please clarify if I have this wrong.
In this case you would have to compare the normality of the residuals, as when 
you transform the dependent variable in different ways I think the likelihoods 
will necessarily shift just because the distribution of the data being 
explained is shifted.  I don't think you can compare likelihoods, AICs, or BICs 
in this way when you alter the dependent, just as you can't compare 
likelihood-based values for models that include mostly the same data points but 
also some different ones.  You can only compare likelihood based values for 
models that have the exact same dependent data but differ in the formulation of 
the model's independent variables, autocorrelation parameters, etc.
As another point, although ranking seemed to fix your normality problem in this 
case, it may be introducing some other issues.  Ranked variables are usually 
too flat for linear regression analysis, and perhaps the test of residual 
normality you are using only detects some deviations like skewness (I'm not 
familiar with the Liliefors test).  I would recommend that instead of ranking 
you plot the distribution of your dependent and independent variables and try 
the transforms (log, arcsine, etc.) that a standard stats book would recommend 
given the appearance of each distribution.  Some variables may not need 
transforming at all.  It may be just one or two that is causing the problem in 
the model residuals, in which case you should transform only those variables.  
Remember that since it's the normality of the residuals that matters, it may be 
more appropriately fixed by transforming an independent variable than a 
dependent one.  If the dependent itself is normal, then nonnormal!
 ity might be introduced in the residuals because of the distribution of a 
particular independent variable.
Best
Luke

Luke J. Matthews | Senior Scientific Director | Activate Networks


------------------------------

Message: 2
Date: Fri, 29 Nov 2013 12:29:06 -0200
From: Agus Camacho <agus.cama...@gmail.com>
To: "r-sig-phylo@r-project.org" <r-sig-phylo@r-project.org>
Subject: [R-sig-phylo] best fit vs normality of residuals
Message-ID:
        <calsj7pssmsg5hp7yiiesquejnuqad2zuprqs4ldx6wxagyx...@mail.gmail.com>
Content-Type: text/plain

Dear colleagues,

Im having difficulties to decide whether I choose a phylogenetic GLS model with 
a higher fit (lower AIC and BIC), or a model in which normality of the 
residuals, after accounting for phylogenetic signal, is compromised. The number 
of species is reasonably high (87), but i dont know if that would justify for 
allowing a highly significant deviation of normality.

When using scaled and centered data, i get:


          AIC         BIC       logLik
  255.2029505 269.5696455 -121.6014753



Correlation Structure: corPagel
 Formula: ~1
 Parameter estimate(s):
        lambda
-0.03313647856

Coefficients:
                                  Value     Std.Error       t-value p-value
(Intercept)                0.0432999895 0.06632562733  0.6528395024  0.5157
X                          0.0425258358 0.03552018760  1.1972300459  0.2347
X2                         0.4620358585 0.18471478739  2.5013474287  0.0144
X1:X2                     -0.1211398020 0.04969892007 -2.4374735269  0.0170



Liliefors test (thanks Liam for posting on this) gave me:


D = 0.1815, p-value = 2.558e-07

I ranked both, the response variable and the factors. My variables had some 
zeros and in some cases negative values, so thought that would be the simplest 
and most robust way. But i might be wrong.



When ranking all variables:


          AIC         BIC       logLik
  766.7826784 781.1493734 -377.3913392



Correlation Structure: corPagel
 Formula: ~1
 Parameter estimate(s):
      lambda
0.1434096557

Coefficients:
                                   Value   Std.Error      t-value p-value
(Intercept)                  5.615576688 9.195445483  0.610691097  0.5431
X1                           0.477054032 0.200882571  2.374790556  0.0199
X2                           0.771914482 0.208616720  3.700156356  0.0004
X1:x2                       -0.007371999 0.004035148 -1.826946400  0.0714


        Lilliefors (Kolmogorov-Smirnov) normality test

data:  chol(solve(vcv(tree))) %*% residuals(M2) D = 0.0545, p-value = 0.7709



Would anybody have a hint on this?
Gracias!
Agus


--
Agust?n Camacho Guerrero.
Doutor em Zoologia.
Laborat?rio de Herpetologia, Departamento de Zoologia, Instituto de 
Bioci?ncias, USP.
Rua do Mat?o, trav. 14, n? 321, Cidade Universit?ria, S?o Paulo - SP, CEP: 
05508-090, Brasil.

        [[alternative HTML version deleted]]



------------------------------

_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


End of R-sig-phylo Digest, Vol 70, Issue 19

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to