Re: [R] decide between polynomial vs ordered factor model (lme)

2006-01-09 Thread Dimitris Rizopoulos
I think that these models are not nested and thus the LRT produced by 
anova.lme() will not be valid; AIC and BIC could be more relevant. In 
terms of interpretability, I'd say that a model treating 'zeitn' as a 
factor is much easier to explain than a model with 4th order 
polynomial.

I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm



- Original Message - 
From: Leo Gürtler [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Monday, January 09, 2006 2:59 PM
Subject: [R] decide between polynomial vs ordered factor model (lme)


 Dear alltogether,

 two lme's, the data are available at:

 http://www.anicca-vijja.de/lg/hlm3_nachw.Rdata

 explanations of the data:

 nachw = post hox knowledge tests over 6 measure time points (= 
 equally
 spaced)
 zeitn = time points (n = 6)
 subgr = small learning groups (n = 28)
 gru = 4 different groups = treatment factor

 levels: time (=zeitn) (n=6) within subject (n=4) within smallgroups
 (=gru) (n = 28), i.e. n = 4 * 28 = 112 persons and 112 * 6 = 672 
 data points

 library(nlme)
 fitlme7 - lme(nachw ~ I(zeitn-3.5) + I((zeitn-3.5)^2) +
 I((zeitn-3.5)^3) + I((zeitn-3.5)^4)*gru, random = list(subgr = ~ 1,
 subject = ~ zeitn), data = hlm3)

 fit5 - lme(nachw ~ ordered(I(zeitn-3.5))*gru, random = list(subgr =
 ~ 1, subject = ~ zeitn), data = hlm3)

 anova( update(fit5, method=ML), update(fitlme7, method=ML) )

  anova( update(fit5, method=ML), update(fitlme7, method=ML) )
Model df  AIC  BIClogLik 
 Test
 update(fit5, method = ML)1 29 2535.821 2666.619 -1238.911
 update(fitlme7, method = ML) 2 16 2529.719 2601.883 -1248.860 
 1 vs 2
 L.Ratio p-value
 update(fit5, method = ML)
 update(fitlme7, method = ML) 19.89766  0.0978
 

 shows that both are ~ equal, although I know about the uncertainty 
 of ML
 tests with lme(). Both models show that the ^2 and the ^4 terms are
 important parts of the model.

 My question is:

 - Is it legitim to choose a model based on these outputs according 
 to
 theoretical considerations instead of statistical tests that not 
 really
 show a superiority of one model over the other one?

 - Is there another criterium I've overseen to decide which model can 
 be
 clearly prefered?

 - The idea behind that is that in the one model (fit5) the second
 contrast of the factor (gru) is statistically significant, although 
 not
 the whole factor in the anova output.
 In the other model, this is not the case.
 Theoretically interesting is of course the significance of the 
 second
 contrast of gru, as it shows a tendency of one treatment being 
 slightly
 superior. I want to choose this model but I am not sure whether this 
 is
 proper action. Both models shows this trend, but only one model 
 clearly
 indicates that this trend bears some empirical meaning.

 Thanks for any suggestions,

 leo


 here are the outputs for each model:

 fitlme7 - lme(nachw ~ I(zeitn-3.5) + I((zeitn-3.5)^2) +
 I((zeitn-3.5)^3) + I((zeitn-3.5)^4)*gru, random = list(subgr = ~ 1,
 subject = ~ zeitn), data = hlm3)
 plot(augPred(fitlme7), layout=c(14,8))
 summary(fitlme7); anova(fitlme7); intervals(fitlme7)
 Linear mixed-effects model fit by REML
 Data: hlm3
   AIC  BIClogLik
  2582.934 2654.834 -1275.467

 Random effects:
 Formula: ~1 | subgr
(Intercept)
 StdDev:   0.5833797

 Formula: ~zeitn | subject %in% subgr
 Structure: General positive-definite, Log-Cholesky parametrization
StdDevCorr
 (Intercept) 0.6881908 (Intr)
 zeitn   0.1936087 -0.055
 Residual1.3495785

 Fixed effects: nachw ~ I(zeitn - 3.5) + I((zeitn - 3.5)^2) + 
 I((zeitn -
 3.5)^3) +  I((zeitn - 3.5)^4) * gru
Value  Std.Error  DF   t-value p-value
 (Intercept)  4.528757 0.17749012 553 25.515542  0.
 I(zeitn - 3.5)   0.010602 0.08754449 553  0.121100  0.9037
 I((zeitn - 3.5)^2)   0.815693 0.09765075 553  8.353171  0.
 I((zeitn - 3.5)^3)   0.001336 0.01584169 553  0.084329  0.9328
 I((zeitn - 3.5)^4)  -0.089655 0.01405811 553 -6.377486  0.
 gru1 0.187181 0.30805090  24  0.607630  0.5491
 gru2 0.532665 0.30805090  24  1.729147  0.0966
 gru3-0.046305 0.30805090  24 -0.150317  0.8818
 I((zeitn - 3.5)^4):gru1 -0.007860 0.00600928 553 -1.307993  0.1914
 I((zeitn - 3.5)^4):gru2 -0.001259 0.00600928 553 -0.209516  0.8341
 I((zeitn - 3.5)^4):gru3 -0.000224 0.00600928 553 -0.037225  0.9703
 Correlation:
(Intr) I(-3.5 I((-3.5)^2 I((-3.5)^3 
 I((z-3.5)^4)
 I(zeitn - 3.5)   0.071
 I((zeitn - 3.5)^2)  -0.465  0.000
 I((zeitn - 3.5)^3)  

Re: [R] decide between polynomial vs ordered factor model (lme)

2006-01-09 Thread Douglas Bates
On 1/9/06, Leo Gürtler [EMAIL PROTECTED] wrote:
 Dear alltogether,

 two lme's, the data are available at:

 http://www.anicca-vijja.de/lg/hlm3_nachw.Rdata

 explanations of the data:

 nachw = post hox knowledge tests over 6 measure time points (= equally
 spaced)
 zeitn = time points (n = 6)
 subgr = small learning groups (n = 28)
 gru = 4 different groups = treatment factor

 levels: time (=zeitn) (n=6) within subject (n=4) within smallgroups
 (=gru) (n = 28), i.e. n = 4 * 28 = 112 persons and 112 * 6 = 672 data points

 library(nlme)
 fitlme7 - lme(nachw ~ I(zeitn-3.5) + I((zeitn-3.5)^2) +
 I((zeitn-3.5)^3) + I((zeitn-3.5)^4)*gru, random = list(subgr = ~ 1,
 subject = ~ zeitn), data = hlm3)

 fit5 - lme(nachw ~ ordered(I(zeitn-3.5))*gru, random = list(subgr =
 ~ 1, subject = ~ zeitn), data = hlm3)

 anova( update(fit5, method=ML), update(fitlme7, method=ML) )

   anova( update(fit5, method=ML), update(fitlme7, method=ML) )
 Model df  AIC  BIClogLik   Test
 update(fit5, method = ML)1 29 2535.821 2666.619 -1238.911
 update(fitlme7, method = ML) 2 16 2529.719 2601.883 -1248.860 1 vs 2
  L.Ratio p-value
 update(fit5, method = ML)
 update(fitlme7, method = ML) 19.89766  0.0978
  

 shows that both are ~ equal, although I know about the uncertainty of ML
 tests with lme(). Both models show that the ^2 and the ^4 terms are
 important parts of the model.

 My question is:

 - Is it legitimate to choose a model based on these outputs according to
 theoretical considerations instead of statistical tests that not really
 show a superiority of one model over the other one?

 - Is there another criterium I've overlooked to decide which model can be
 clearly preferred?

 - The idea behind that is that in the one model (fit5) the second
 contrast of the factor (gru) is statistically significant, although not
 the whole factor in the anova output.
 In the other model, this is not the case.
 Theoretically interesting is of course the significance of the second
 contrast of gru, as it shows a tendency of one treatment being slightly
 superior. I want to choose this model but I am not sure whether this is
 proper action. Both models shows this trend, but only one model clearly
 indicates that this trend bears some empirical meaning.

 Thanks for any suggestions,

The comparisons may be more clearly shown if you create the ordered
factor and a second version of the ordered factor what has the
contrasts set so it produces a 4th order polynomial.  That is, set

hlm3$ozeit - ordered(hlm3$zeitn)
hlm3$ozeit4 - C(hlm3$ozeit, contr.poly, 4)

then define one model in terms of ozeit and a second model in terms of ozeit4.

I would go further and create a new binary factor from gru that
contrasted level 2 against the other three levels and use that instead
of gru.

For a model fit by lme I would use the one-argument form of anova to
assess the significance of terms in the fixed effects.  (That advice
doesn't hold for models fit by lmer - at least at present.)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html