[R] Some coefficients are doubled when I use the step() function

2012-12-09 Thread Chris Beeley
Hello-

Such a strange problem, can't figure it out at all. Using binomial glm
models, and the step() function, so the call looks like this:

sectionmodel = glm(formula = Target3 ~ S1Q12_NUM.1 + S1Q9_NUM.1 + S1Q5_NUM.1 +
S1Q7_NUM.1 + S1Q8_NUM.1 + S1Q6_NUM.1 + S1Q10_NUM.1 + S1Q12_BURG.1 +
S1Q12_CD.1 + S1Q4.1 + S1Q12_OTHVIOL.1 + S1Q8.1 + S1Q12_GBH.1 +
S1Q11.1 + S1Q7.1 + S1Q12_THEFT.1 + S1Q12_DRIV.1 + S1Q5.1 +
S1Q9.1 + S1Q12_DRUG.1, family = binomial, data = moddata)

But when I run step() on the resulting model, some of the coefficents
are doubled when it comes back, with a 2 at the end, e.g. like this:

mymodel = step(sectionmodel, direction=backward, test=F)

summary(mymodel) returns this:

Coefficients:
 Estimate Std. Error z value Pr(|z|)
(Intercept)  -4.585190.55675  -8.236   2e-16 ***
S1Q12_NUM.1   0.184460.08576   2.151   0.0315 *
S1Q4.12   0.568930.40281   1.412   0.1578
S1Q12_OTHVIOL.11  0.564350.38262   1.475   0.1402
S1Q12_GBH.11  0.491990.33175   1.483   0.1381
S1Q7.11  -1.273301.12897  -1.128   0.2594
S1Q7.12  -1.839271.16909  -1.573   0.1157
S1Q5.11   0.917421.19489   0.768   0.4426
S1Q5.12   2.168611.19864   1.809   0.0704 .
S1Q12_DRUG.11-0.484000.29898  -1.619   0.1055

As you can see S1Q7.1 and S1Q5.1 are duplicated as S1Q7.11 and S1Q7.12 etc.

I've googled and read and re-read the step() and stepAIC()
documentation and I just can't figure out what it could mean. Removing
the test=F bit also generates the same behaviour.

Any help greatly appreciated.

Chris Beeley
Institute of Mental Health, UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Some coefficients are doubled when I use the step() function

2012-12-09 Thread Ben Bolker
Chris Beeley chris.beeley at gmail.com writes:

 Such a strange problem, can't figure it out at all. Using binomial glm
 models, and the step() function, so the call looks like this:
 
 sectionmodel = glm(formula = Target3 ~ S1Q12_NUM.1 + S1Q9_NUM.1 + S1Q5_NUM.1 +
  [snip]

 But when I run step() on the resulting model, some of the coefficents
 are doubled when it comes back, with a 2 at the end, e.g. like this:
 
 mymodel = step(sectionmodel, direction=backward, test=F)
 
 summary(mymodel) returns this:
 
 Coefficients:
  Estimate Std. Error z value Pr(|z|)
 (Intercept)  -4.585190.55675  -8.236   2e-16 ***
 S1Q12_NUM.1   0.184460.08576   2.151   0.0315 *
 S1Q4.12   0.568930.40281   1.412   0.1578
 S1Q12_OTHVIOL.11  0.564350.38262   1.475   0.1402
 S1Q12_GBH.11  0.491990.33175   1.483   0.1381
 S1Q7.11  -1.273301.12897  -1.128   0.2594
 S1Q7.12  -1.839271.16909  -1.573   0.1157
 S1Q5.11   0.917421.19489   0.768   0.4426
 S1Q5.12   2.168611.19864   1.809   0.0704 .
 S1Q12_DRUG.11-0.484000.29898  -1.619   0.1055
 
 As you can see S1Q7.1 and S1Q5.1 are duplicated as S1Q7.11 and
 S1Q7.12 etc.  I've googled and read and re-read the step() and
 stepAIC() documentation and I just can't figure out what it could
 mean. Removing the test=F bit also generates the same behaviour.
 Any help greatly appreciated.  Chris Beeley Institute of Mental
 Health, UK

  My guess is that S1Q7.1 and S1Q5.1 are (possibly accidentally)
categorical variables (factors), and that either the second and
third levels of the factors are 1 and 2, or you have set
sum-to-zero contrasts somewhere along the line.

  Note that other variables have numeric values appended to
their names, which indicates that they are also being treated
as categorical variables, and that their levels are coded
numerically ... (e.g. SIQ4.1)

  My prediction is that this doubling is independent of
the use of step(), and that you would see these parameters
reflected in the summary() of the full model ...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.