Re: [R-sig-phylo] best fit vs normality of residuals

2013-12-06 Thread Agus Camacho
You got it perfectly Luke,
Many thanks for your detailed answer!
Cheers,
Agus




Re: [R-sig-phylo] best fit vs normality of
residualshttps://www.mail-archive.com/search?l=r-sig-phylo@r-project.orgq=subject:%22Re%3A+%5BR-sig-phylo%5D+best+fit+vs+normality+of+residuals%22

Luke 
Matthewshttps://www.mail-archive.com/search?l=r-sig-phylo@r-project.orgq=from:%22Luke+Matthews%22
 Tue, 03 Dec 2013 08:33:36
-0800https://www.mail-archive.com/search?l=r-sig-phylo@r-project.orgq=date:20131203


Hi Agus,
If I understand your post correctly, you implemented the two models with
exactly the same formula, and phylogenetic tree, and varied only the transform
applied to the variables.  In one case the transform was to 'scale and center'
both the dependent and independent variables while in the other case you rank
ordered the all variables.  Please clarify if I have this wrong.
In this case you would have to compare the normality of the residuals, as when
you transform the dependent variable in different ways I think the likelihoods
will necessarily shift just because the distribution of the data being
explained is shifted.  I don't think you can compare likelihoods, AICs, or BICs
in this way when you alter the dependent, just as you can't compare
likelihood-based values for models that include mostly the same data points but
also some different ones.  You can only compare likelihood based values for
models that have the exact same dependent data but differ in the formulation of
the model's independent variables, autocorrelation parameters, etc.
As another point, although ranking seemed to fix your normality problem in this
case, it may be introducing some other issues.  Ranked variables are usually
too flat for linear regression analysis, and perhaps the test of residual
normality you are using only detects some deviations like skewness (I'm not
familiar with the Liliefors test).  I would recommend that instead of ranking
you plot the distribution of your dependent and independent variables and try
the transforms (log, arcsine, etc.) that a standard stats book would recommend
given the appearance of each distribution.  Some variables may not need
transforming at all.  It may be just one or two that is causing the problem in
the model residuals, in which case you should transform only those variables.
Remember that since it's the normality of the residuals that matters, it may be
more appropriately fixed by transforming an independent variable than a
dependent one.  If the dependent itself is normal, then nonnormal!
 ity might be introduced in the residuals because of the distribution of a
particular independent variable.
Best
Luke



2013/11/29 Agus Camacho agus.cama...@gmail.com

 Dear colleagues,

 Im having difficulties to decide whether I choose a phylogenetic GLS model
 with a higher fit (lower AIC and BIC), or a model in which normality of the
 residuals, after accounting for phylogenetic signal, is compromised. The
 number of species is reasonably high (87), but i dont know if that would
 justify for allowing a highly significant deviation of normality.

 When using scaled and centered data, i get:


   AIC BIC   logLik
   255.2029505 269.5696455 -121.6014753



 Correlation Structure: corPagel
  Formula: ~1
  Parameter estimate(s):
 lambda
 -0.03313647856

 Coefficients:
   Value Std.Error   t-value p-value
 (Intercept)0.0432999895 0.06632562733  0.6528395024  0.5157
 X  0.0425258358 0.03552018760  1.1972300459  0.2347
 X2 0.4620358585 0.18471478739  2.5013474287  0.0144
 X1:X2 -0.1211398020 0.04969892007 -2.4374735269  0.0170



 Liliefors test (thanks Liam for posting on this) gave me:


 D = 0.1815, p-value = 2.558e-07

 I ranked both, the response variable and the factors. My variables had
 some zeros and in some cases negative values, so thought that would be the
 simplest and most robust way. But i might be wrong.



 When ranking all variables:


   AIC BIC   logLik
   766.7826784 781.1493734 -377.3913392



 Correlation Structure: corPagel
  Formula: ~1
  Parameter estimate(s):
   lambda
 0.1434096557

 Coefficients:
Value   Std.Error  t-value p-value
 (Intercept)  5.615576688 9.195445483  0.610691097  0.5431
 X1   0.477054032 0.200882571  2.374790556  0.0199
 X2   0.771914482 0.208616720  3.700156356  0.0004
 X1:x2   -0.007371999 0.004035148 -1.826946400  0.0714


   Lilliefors (Kolmogorov-Smirnov) normality test

 data:  chol(solve(vcv(tree))) %*% residuals(M2)
 D = 0.0545, p-value = 0.7709



 Would anybody have a hint on this?
 Gracias!
 Agus


 --
 Agustín Camacho Guerrero.
 Doutor em Zoologia.
 Laboratório de Herpetologia, Departamento de Zoologia, Instituto de
 Biociências, USP.
 Rua do Matão, trav. 

Re: [R-sig-phylo] best fit vs normality of residuals

2013-12-03 Thread Luke Matthews
Hi Agus,
If I understand your post correctly, you implemented the two models with 
exactly the same formula, and phylogenetic tree, and varied only the transform 
applied to the variables.  In one case the transform was to 'scale and center' 
both the dependent and independent variables while in the other case you rank 
ordered the all variables.  Please clarify if I have this wrong.
In this case you would have to compare the normality of the residuals, as when 
you transform the dependent variable in different ways I think the likelihoods 
will necessarily shift just because the distribution of the data being 
explained is shifted.  I don't think you can compare likelihoods, AICs, or BICs 
in this way when you alter the dependent, just as you can't compare 
likelihood-based values for models that include mostly the same data points but 
also some different ones.  You can only compare likelihood based values for 
models that have the exact same dependent data but differ in the formulation of 
the model's independent variables, autocorrelation parameters, etc.  
As another point, although ranking seemed to fix your normality problem in this 
case, it may be introducing some other issues.  Ranked variables are usually 
too flat for linear regression analysis, and perhaps the test of residual 
normality you are using only detects some deviations like skewness (I'm not 
familiar with the Liliefors test).  I would recommend that instead of ranking 
you plot the distribution of your dependent and independent variables and try 
the transforms (log, arcsine, etc.) that a standard stats book would recommend 
given the appearance of each distribution.  Some variables may not need 
transforming at all.  It may be just one or two that is causing the problem in 
the model residuals, in which case you should transform only those variables.  
Remember that since it's the normality of the residuals that matters, it may be 
more appropriately fixed by transforming an independent variable than a 
dependent one.  If the dependent itself is normal, then nonnormal!
 ity might be introduced in the residuals because of the distribution of a 
particular independent variable.  
Best
Luke

Luke J. Matthews | Senior Scientific Director | Activate Networks


--

Message: 2
Date: Fri, 29 Nov 2013 12:29:06 -0200
From: Agus Camacho agus.cama...@gmail.com
To: r-sig-phylo@r-project.org r-sig-phylo@r-project.org
Subject: [R-sig-phylo] best fit vs normality of residuals
Message-ID:
calsj7pssmsg5hp7yiiesquejnuqad2zuprqs4ldx6wxagyx...@mail.gmail.com
Content-Type: text/plain

Dear colleagues,

Im having difficulties to decide whether I choose a phylogenetic GLS model with 
a higher fit (lower AIC and BIC), or a model in which normality of the 
residuals, after accounting for phylogenetic signal, is compromised. The number 
of species is reasonably high (87), but i dont know if that would justify for 
allowing a highly significant deviation of normality.

When using scaled and centered data, i get:


  AIC BIC   logLik
  255.2029505 269.5696455 -121.6014753



Correlation Structure: corPagel
 Formula: ~1
 Parameter estimate(s):
lambda
-0.03313647856

Coefficients:
  Value Std.Error   t-value p-value
(Intercept)0.0432999895 0.06632562733  0.6528395024  0.5157
X  0.0425258358 0.03552018760  1.1972300459  0.2347
X2 0.4620358585 0.18471478739  2.5013474287  0.0144
X1:X2 -0.1211398020 0.04969892007 -2.4374735269  0.0170



Liliefors test (thanks Liam for posting on this) gave me:


D = 0.1815, p-value = 2.558e-07

I ranked both, the response variable and the factors. My variables had some 
zeros and in some cases negative values, so thought that would be the simplest 
and most robust way. But i might be wrong.



When ranking all variables:


  AIC BIC   logLik
  766.7826784 781.1493734 -377.3913392



Correlation Structure: corPagel
 Formula: ~1
 Parameter estimate(s):
  lambda
0.1434096557

Coefficients:
   Value   Std.Error  t-value p-value
(Intercept)  5.615576688 9.195445483  0.610691097  0.5431
X1   0.477054032 0.200882571  2.374790556  0.0199
X2   0.771914482 0.208616720  3.700156356  0.0004
X1:x2   -0.007371999 0.004035148 -1.826946400  0.0714


Lilliefors (Kolmogorov-Smirnov) normality test

data:  chol(solve(vcv(tree))) %*% residuals(M2) D = 0.0545, p-value = 0.7709



Would anybody have a hint on this?
Gracias!
Agus


--
Agust?n Camacho Guerrero.
Doutor em Zoologia.
Laborat?rio de Herpetologia, Departamento de Zoologia, Instituto de 
Bioci?ncias, USP.
Rua do Mat?o, trav. 14, n? 321, Cidade Universit?ria, S?o Paulo - SP, CEP: 
05508-090, Brasil.

[[alternative HTML version deleted

Re: [R-sig-phylo] best fit vs normality of residuals

2013-12-03 Thread Theodore Garland Jr
Good advice!

Cheers,
Ted

From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on 
behalf of Luke Matthews [lmatth...@activatenetworks.net]
Sent: Tuesday, December 03, 2013 8:29 AM
To: r-sig-phylo@r-project.org
Subject: Re: [R-sig-phylo] best fit vs normality of residuals

Hi Agus,
If I understand your post correctly, you implemented the two models with 
exactly the same formula, and phylogenetic tree, and varied only the transform 
applied to the variables.  In one case the transform was to 'scale and center' 
both the dependent and independent variables while in the other case you rank 
ordered the all variables.  Please clarify if I have this wrong.
In this case you would have to compare the normality of the residuals, as when 
you transform the dependent variable in different ways I think the likelihoods 
will necessarily shift just because the distribution of the data being 
explained is shifted.  I don't think you can compare likelihoods, AICs, or BICs 
in this way when you alter the dependent, just as you can't compare 
likelihood-based values for models that include mostly the same data points but 
also some different ones.  You can only compare likelihood based values for 
models that have the exact same dependent data but differ in the formulation of 
the model's independent variables, autocorrelation parameters, etc.
As another point, although ranking seemed to fix your normality problem in this 
case, it may be introducing some other issues.  Ranked variables are usually 
too flat for linear regression analysis, and perhaps the test of residual 
normality you are using only detects some deviations like skewness (I'm not 
familiar with the Liliefors test).  I would recommend that instead of ranking 
you plot the distribution of your dependent and independent variables and try 
the transforms (log, arcsine, etc.) that a standard stats book would recommend 
given the appearance of each distribution.  Some variables may not need 
transforming at all.  It may be just one or two that is causing the problem in 
the model residuals, in which case you should transform only those variables.  
Remember that since it's the normality of the residuals that matters, it may be 
more appropriately fixed by transforming an independent variable than a 
dependent one.  If the dependent itself is normal, then nonnormal!
 ity might be introduced in the residuals because of the distribution of a 
particular independent variable.
Best
Luke

Luke J. Matthews | Senior Scientific Director | Activate Networks


--

Message: 2
Date: Fri, 29 Nov 2013 12:29:06 -0200
From: Agus Camacho agus.cama...@gmail.com
To: r-sig-phylo@r-project.org r-sig-phylo@r-project.org
Subject: [R-sig-phylo] best fit vs normality of residuals
Message-ID:
calsj7pssmsg5hp7yiiesquejnuqad2zuprqs4ldx6wxagyx...@mail.gmail.com
Content-Type: text/plain

Dear colleagues,

Im having difficulties to decide whether I choose a phylogenetic GLS model with 
a higher fit (lower AIC and BIC), or a model in which normality of the 
residuals, after accounting for phylogenetic signal, is compromised. The number 
of species is reasonably high (87), but i dont know if that would justify for 
allowing a highly significant deviation of normality.

When using scaled and centered data, i get:


  AIC BIC   logLik
  255.2029505 269.5696455 -121.6014753



Correlation Structure: corPagel
 Formula: ~1
 Parameter estimate(s):
lambda
-0.03313647856

Coefficients:
  Value Std.Error   t-value p-value
(Intercept)0.0432999895 0.06632562733  0.6528395024  0.5157
X  0.0425258358 0.03552018760  1.1972300459  0.2347
X2 0.4620358585 0.18471478739  2.5013474287  0.0144
X1:X2 -0.1211398020 0.04969892007 -2.4374735269  0.0170



Liliefors test (thanks Liam for posting on this) gave me:


D = 0.1815, p-value = 2.558e-07

I ranked both, the response variable and the factors. My variables had some 
zeros and in some cases negative values, so thought that would be the simplest 
and most robust way. But i might be wrong.



When ranking all variables:


  AIC BIC   logLik
  766.7826784 781.1493734 -377.3913392



Correlation Structure: corPagel
 Formula: ~1
 Parameter estimate(s):
  lambda
0.1434096557

Coefficients:
   Value   Std.Error  t-value p-value
(Intercept)  5.615576688 9.195445483  0.610691097  0.5431
X1   0.477054032 0.200882571  2.374790556  0.0199
X2   0.771914482 0.208616720  3.700156356  0.0004
X1:x2   -0.007371999 0.004035148 -1.826946400  0.0714


Lilliefors (Kolmogorov-Smirnov) normality test

data:  chol(solve(vcv(tree))) %*% residuals(M2) D = 0.0545, p-value = 0.7709



Would anybody have a hint on this?
Gracias!
Agus

[R-sig-phylo] best fit vs normality of residuals

2013-11-29 Thread Agus Camacho
Dear colleagues,

Im having difficulties to decide whether I choose a phylogenetic GLS model
with a higher fit (lower AIC and BIC), or a model in which normality of the
residuals, after accounting for phylogenetic signal, is compromised. The
number of species is reasonably high (87), but i dont know if that would
justify for allowing a highly significant deviation of normality.

When using scaled and centered data, i get:


  AIC BIC   logLik
  255.2029505 269.5696455 -121.6014753



Correlation Structure: corPagel
 Formula: ~1
 Parameter estimate(s):
lambda
-0.03313647856

Coefficients:
  Value Std.Error   t-value p-value
(Intercept)0.0432999895 0.06632562733  0.6528395024  0.5157
X  0.0425258358 0.03552018760  1.1972300459  0.2347
X2 0.4620358585 0.18471478739  2.5013474287  0.0144
X1:X2 -0.1211398020 0.04969892007 -2.4374735269  0.0170



Liliefors test (thanks Liam for posting on this) gave me:


D = 0.1815, p-value = 2.558e-07

I ranked both, the response variable and the factors. My variables had some
zeros and in some cases negative values, so thought that would be the
simplest and most robust way. But i might be wrong.



When ranking all variables:


  AIC BIC   logLik
  766.7826784 781.1493734 -377.3913392



Correlation Structure: corPagel
 Formula: ~1
 Parameter estimate(s):
  lambda
0.1434096557

Coefficients:
   Value   Std.Error  t-value p-value
(Intercept)  5.615576688 9.195445483  0.610691097  0.5431
X1   0.477054032 0.200882571  2.374790556  0.0199
X2   0.771914482 0.208616720  3.700156356  0.0004
X1:x2   -0.007371999 0.004035148 -1.826946400  0.0714


Lilliefors (Kolmogorov-Smirnov) normality test

data:  chol(solve(vcv(tree))) %*% residuals(M2)
D = 0.0545, p-value = 0.7709



Would anybody have a hint on this?
Gracias!
Agus


-- 
Agustín Camacho Guerrero.
Doutor em Zoologia.
Laboratório de Herpetologia, Departamento de Zoologia, Instituto de
Biociências, USP.
Rua do Matão, trav. 14, nº 321, Cidade Universitária,
São Paulo - SP, CEP: 05508-090, Brasil.

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/