Re: [R-lang] Weird outcome logit mixed-effect model

Roger Levy Mon, 22 Dec 2008 13:31:59 -0800

Laura de Ruiter wrote:

Dear R-users and -experts,
I am performing a rather simple analysis on a small data set (pastedbelow this email) and keep getting a to me inexplicable result. PerhapsI am missing something here - it would be great if someone could pointout to me what I am doing wrong.
I want to test whether the factor "Info" (which has three levels: "new","given", "accessible") is a significant predictor for the binaryvariable "DeaccYN". The random factor is "Subject". The distribution ofthe data looks as follows:
-----------------------------------------------------------------------------Info
DeaccYN given new accessible
no 25 42 21
yes 11 0 1
------------------------------------------------------------------------------
This is the model:
----------------------------------------------------------------------------------------------------------deacc.lmer = lmer (DeaccYN ~ Info + (1|Subject), data = dat, family ="binomial")-----------------------------------------------------------------------------------------------------------------
However, given the distribution above, this outcome seems rather weirdto me:
---------------------------------------------------------------------------------------------------------
summary (deacc.lmer)
Generalized linear mixed model fit using Laplace
Formula: DeaccYN ~ Info + (1 | Subject)
Data: dat
Family: binomial(logit link)
AIC BIC logLik deviance
60.4 70.82 -26.2 52.4
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 0.18797 0.43356
number of obs: 100, groups: Subject, 21

Estimated scale (compare to 1 ) 0.7316067

Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.8635 0.3795 -2.2754 0.0229 *
Infonew -18.7451 2764.2445 -0.0068 0.9946
Infoaccessible -2.2496 1.1186 -2.0110 0.0443 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> [...]

----------------------------------------------------------------------------------------------------
Why should the difference between 25/11 and 21/1 be significant, but thedifference between 25/11 and 42/0 not? Very odd to me seems the standarderror of 2764!

> [...]

I was wondering: Is it perhaps a problem for the model that there are nocases in the DeaccYN == "yes" category for Info == "given"? And if this

                                                      ^^^^^
                                         I believe you mean "new" here.

is the case, why?
Am I overlooking something here?


Dear Laura,

Independently of the issue that Florian is raising...you are right thatthe lack of is a problem for the model (to be precise, it's a problemfor estimating the significance of the parameter estimate using the zvalue). The z value, which is the basis of the significance levelreported in the lmer summary, is based on the Wald statistic, which is


  z = \hat{b} / StdError(\hat{b})

where \hat{b} is the parameter estimate. However, for parameterestimates of large magnitudes (as is the case for "new" here), thestandard error is inflated, which leads to a small wald statistic. Thisproblem is discussed in a number of places, including Agresti (1996,2002); here's a decent mention of it online:


  http://userwww.sfsu.edu/~efc/classes/biol710/logistic/logisticreg.htm

You can see this quite clearly if you add just one "yes/given" exampleat the end of your data frame, for example:


  101 132 new yes

Then the model gives (being sure to correctly specify the levels of "Info"):

> summary(deacc.lmer)
Generalized linear mixed model fit by the Laplace approximation
Formula: Deacc ~ Info + (1 | Subject)
   Data: dat
   AIC   BIC logLik deviance
 69.78 80.24 -30.89    61.78
Random effects:
 Groups  Name        Variance Std.Dev.
 Subject (Intercept) 0.34332  0.58594
Number of obs: 101, groups: Subject, 21

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -0.8996     0.3936  -2.286  0.02226 *
Iaccessible  -2.2808     1.1460  -1.990  0.04656 *
Inew         -3.0050     1.1395  -2.637  0.00836 **

The next question, of course, is how to deal with the dataset youactually have. If you weren't using a mixed-effects model, you coulduse a likelihood-ratio test by comparing your full model with a simplermodel; the likelihood-ratio test isn't susceptible to problems withlarge parameter estimates the way the Wald test is. For example:


> deacc.glm = glm (Deacc ~ Info, data = dat, family = "binomial")
> deacc.glm1 = glm (Deacc ~ Info1, data = dat, family = "binomial")
> anova(deacc.glm,deacc.glm1,test="Chisq")
Analysis of Deviance Table

Model 1: Deacc ~ Info
Model 2: Deacc ~ Info1
  Resid. Df Resid. Dev Df Deviance P(>|Chi|)
1        97     52.452
2        98     71.600 -1  -19.148 1.210e-05

In principle, you could do this with your mixed-effects model, beingsure to use ML instead of REML fitting:

> deacc.lmer = lmer (Deacc ~ Info + (1 | Subject), data = dat, family ="binomial",REML=F)> deacc.lmer1 = lmer (Deacc ~ Info1 + (1 | Subject), data = dat, family= "binomial",REML=F)

> anova(deacc.lmer,deacc.lmer1)
Data: dat
Models:
deacc.lmer1: Deacc ~ Info1 + (1 | Subject)
deacc.lmer: Deacc ~ Info + (1 | Subject)
            Df     AIC     BIC  logLik Chisq Chi Df Pr(>Chisq)
deacc.lmer1  3  77.600  85.416 -35.800
deacc.lmer   4  60.400  70.821 -26.200  19.2      1  1.177e-05 ***

There is an argument that the likelihood-ratio test is anti-conservativeand hence inappropriate for comparing mixed-effects models differingonly in fixed-effects structure. (See Pinheiro & Bates, 2000, aroundpage 76 or so. The argument is made only for linear mixed-effectsmodels and I'm not sure of the status for logit mixed-effects models.)That being said, it's not clear how strong the anti-conservativity is,and in your case it seems like you have such an exceedingly powerfuleffect that you might be safe in using the likelihood-ratio test hereand just mentioning the potential anti-conservativity as a caveat.

So the summary is: believe it or not, how to assess the significance ofthe parameter estimate for "new" for your model & dataset is a bit of anopen question, but it seems pretty clear that the estimate issignificantly non-zero.


Best & hope this helps.

Roger



--

Roger Levy                      Email: rl...@ucsd.edu
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy

_______________________________________________
R-lang mailing list
R-lang@ling.ucsd.edu
http://pidgin.ucsd.edu/mailman/listinfo/r-lang

Re: [R-lang] Weird outcome logit mixed-effect model

Reply via email to