[R] question about glm vs. loglin()

Yana Kane-Esrig Thu, 15 Sep 2011 14:43:46 -0700

Dear R gurus,

I am looking for a way to fit a predictive model for a contingency table which 
has counts. I found that glm( family=poisson) is very good for figuring out 
which of several alternative models I should select. But once I select a model 
it is hard to present and interpret it, especially when it has interactions, 
because everything is done "relative to reference cell". This makes it 
confusing for the audience.



I found that loglin() gives what might be much easier to interpret output as 
far as coefficients estimates are concerned because they are laid out in a nice 
table and are provided for all the values of all the factors. But I need to be 
able to explain what the coefficients really mean. For that, I need to 
understand how they are used in a formula to compute a fitted value. 

If loglin() has fitted a model (see example below) what would be a formula that 
it would use to computer predicted count for, 
say, the cell with S = H, E=H, P=No in a sample that has a total of 4991 
observations?Â  In other words, how did it arrive at the number 270.01843 in 
the upper left hand corner of $fit? 


I see that loglin() computes exactly the same predictions (fitted values) as 
glm( counts ~ S + E +P + S:E + S:P + E:P, data=wisconsin, family=poisson) see 
below)Â  but it gives different values of the estimates for parameters. So I 
figure the formula it uses to compute 
the fitted values is not the same as what is used in Poisson 
regression. 

If there is a better way to fit this type of model and provide easy to 
understand and interpret / present coefficient summary, please let me know. 

Just in case, I provided the original data at the very bottom.Â  
Â 


YZK



#################### use loglin() ###################################


loglin.3 = loglin(wisconsin.table, 
margin = list( c(1,2), c(1,3), c(2,3) ), fit=T, param=T)
loglin.3
> loglin.3
$lrt
[1] 1.575469

$pearson
[1] 1.572796

$df
[1]
 3

$margin
$margin[[1]]
[1] "S" "E"

$margin[[2]]
[1] "S" "P"

$margin[[3]]
[1] "E" "P"


$fit
, , P = No

Â Â Â  E
SÂ Â Â Â Â Â Â Â Â Â Â  HÂ Â Â Â Â Â Â Â  L
Â  HÂ  270.01843 148.98226
Â  LÂ  228.85782 753.14127
Â  LM 331.04036 625.95942
Â  UM 373.08339 420.91704

, , P = Yes

Â Â Â  E
SÂ Â Â Â Â Â Â Â Â Â Â  HÂ Â Â Â Â Â Â Â  L
Â  HÂ  795.97572Â  30.02330
Â  LÂ  137.14648Â  30.85410
Â  LM 301.96657Â  39.03387
Â  UM 467.91123Â  36.08873


$param
$param$`(Intercept)`
[1] 5.275394

$param$S
Â Â Â Â Â Â Â Â  HÂ Â Â Â Â Â Â Â Â 
 LÂ Â Â Â Â Â Â Â  LMÂ Â Â Â Â Â Â Â  UM 
-0.1044289 -0.1734756Â  0.1286741Â  0.1492304 
#I think this says that we had a lot of S = LM and S= UM kids in our sample and 
relatively few S= L kids

$param$E
Â Â Â Â Â Â Â  HÂ Â Â Â Â Â Â Â  L 
Â 0.501462 -0.501462 
#I think this says that more kids had E=H than E=L
# sum(wisconsin$counts[wisconsin$E=="L"]) [1] 2085
# sum(wisconsin$counts[wisconsin$E=="H"]) [1] 2906

$param$P
Â Â Â Â Â Â Â  NoÂ Â Â Â Â Â Â  Yes 
Â 0.5827855 -0.5827855 

$param$S.E
Â Â Â  E
SÂ Â Â Â Â Â Â Â Â Â Â Â  HÂ Â Â Â Â Â Â Â Â  L
Â  HÂ Â  0.4666025 -0.4666025Â  #kids in S=H were
 more likely to get E=H than E=L
Â  LÂ  -0.4263050Â  0.4263050Â  #kids in S=L were more likely to get E=L than 
E=H 
Â  LM -0.1492516Â  0.1492516
Â  UMÂ  0.1089541 -0.1089541

$param$S.P
Â Â Â  P
SÂ Â Â Â Â Â Â Â Â Â Â Â  NoÂ Â Â Â Â Â Â Â  Yes
Â  HÂ  -0.45259177Â  0.45259177
Â  LÂ Â  0.34397315 -0.34397315
Â  LMÂ  0.13390947 -0.13390947
Â  UM -0.02529085Â  0.02529085

$param$E.P
Â Â  P
EÂ Â Â Â Â Â Â Â Â  NoÂ Â Â Â Â Â  Yes
Â  H -0.670733Â  0.670733Â  #kids with E=H were more likely to have P=Yes than 
kids with E=L
Â  LÂ  0.670733 -0.670733


############### use glm () ########################################

summary(glm2)

Call:
glm(formula = counts ~ S + E + P + S:E + S:P + E:P, family = poisson, 
Â Â Â  data = wisconsin)

Deviance Residuals: 
Â Â Â Â Â Â  1Â Â Â Â Â Â Â Â  2Â Â Â Â Â Â Â Â  3Â Â Â Â Â Â Â Â  
4Â Â Â Â Â Â Â Â  5Â Â Â Â Â Â Â Â  6Â Â Â Â Â Â Â Â  7Â Â Â Â Â Â Â Â  8Â  
-0.15119Â Â  0.27320Â Â  0.04135Â  -0.05691Â  -0.04446Â Â  0.04719Â Â  
0.32807Â  -0.24539Â  
Â Â Â Â Â Â  9Â Â Â Â Â Â Â  10Â Â Â Â Â Â Â  11Â Â Â Â Â Â Â  12Â Â Â Â Â Â Â  
13Â Â Â Â Â Â Â  14Â Â Â Â Â Â Â  15Â Â Â Â Â Â Â  16Â  
Â 0.73044Â  -0.35578Â  -0.16639Â Â  0.05952Â Â  0.15116Â  -0.04217Â  
-0.75147Â Â  0.14245Â  

Coefficients:
Â Â Â Â Â Â Â Â Â Â Â  Estimate Std. Error z value Pr(>|z|)Â Â Â  
(Intercept)Â  5.59850Â Â Â  0.05886Â  95.116Â  < 2e-16 ***
SLÂ Â Â Â Â Â Â Â Â  -0.16542Â Â Â  0.08573Â  -1.930Â  0.05366 .Â  
SLMÂ Â Â Â Â Â Â Â Â  0.20372Â Â Â  0.07841Â Â  2.598Â  0.00937 ** 
SUMÂ Â Â Â Â Â Â Â Â  0.32331Â Â Â  0.07664Â Â  4.219 2.46e-05 ***
ELÂ Â Â Â Â Â Â Â Â  -0.59471Â Â Â  0.09234Â  -6.441 1.19e-10 ***
PYesÂ Â Â Â Â Â Â Â  1.08107Â Â Â  0.06731Â  16.060Â  < 2e-16 ***
SL:ELÂ Â Â Â Â Â Â  1.78588Â Â Â  0.11444Â  15.606Â  < 2e-16 ***
SLM:ELÂ Â Â Â Â Â  1.23178Â Â Â  0.10987Â  11.211Â  < 2e-16 ***
SUM:ELÂ Â Â Â Â Â  0.71532Â Â Â  0.11136Â Â  6.424 1.33e-10 ***
SL:PYesÂ Â Â Â  -1.59311Â Â Â  0.11527 -13.820Â  < 2e-16 ***
SLM:PYesÂ Â Â  -1.17298Â Â Â  0.09803 -11.965Â  < 2e-16 ***
SUM:PYesÂ Â Â  -0.85460Â Â Â  0.09259Â  -9.230Â  < 2e-16 ***
EL:PYesÂ Â Â Â  -2.68292Â Â Â  0.09867 -27.191Â  < 2e-16 ***
---
Signif. codes:Â  0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â 
â 1 

(Dispersion parameter for poisson family taken to be 1)

Â Â Â  Null deviance: 3211.0014Â  on 15Â  degrees of freedom
Residual deviance:Â Â Â  1.5755Â  onÂ  3Â  degrees of freedom
AIC: 141.39

################ Original data ############################


#data from Wisconsin that classifies 4991 high school seniors according to
socio-economic status S= (low, lower middle, upper middle, and high), 
# the degree of parental encouragement they receive E= (low and high) 
# and whether or not they have plans to attend college P(no, yes).

#s= social stratum, E=parental encouragement P= college plans

#S= social stratum, E=parental encouragement P= college plans

S=c("L", "L", "LM", "LM", "UM", "UM", "H", "H")
S=c(S,S)

E = rep ( c("L", "H"), 8)

P=Â  c (rep("No", 8), rep("Yes",8))

counts = c(749, 233, 627, 330, 420, 374, 153, 266,
35,133,38,303,37,467,26,800)




wisconsin = data.frame(S, E, P, counts)

> wisconsin
Â Â Â  S EÂ Â  P counts
1Â Â  L LÂ  NoÂ Â Â  749
2Â Â  L HÂ  NoÂ Â Â  233
3Â  LM LÂ  NoÂ Â Â  627
4Â  LM HÂ  NoÂ Â Â  330
5Â  UM LÂ  NoÂ Â Â  420
6Â  UM HÂ  NoÂ Â Â  374
7Â Â  H LÂ  NoÂ Â Â  153
8Â Â  H HÂ  NoÂ Â Â  266
9Â Â  L L YesÂ Â Â Â  35
10Â  L H YesÂ Â Â  133
11 LM L YesÂ Â Â Â  38
12 LM H YesÂ Â Â  303
13 UM L YesÂ Â Â Â  37
14 UM H YesÂ Â Â  467
15Â  H L YesÂ Â Â Â  26
16Â  H H YesÂ Â Â  800
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] question about glm vs. loglin()

Reply via email to