Dear R gurus,

I am looking for a way to fit a predictive model for a contingency table which 
has counts. I found that glm( family=poisson) is very good for figuring out 
which of several alternative models I should select. But once I select a model 
it is hard to present and interpret it, especially when it has interactions, 
because everything is done "relative to reference cell". This makes it 
confusing for the audience. 


I found that loglin() gives what might be much easier to interpret output as 
far as coefficients estimates are concerned because they are laid out in a nice 
table and are provided for all the values of all the factors. But I need to be 
able to explain what the coefficients really mean. For that, I need to 
understand how they are used in a formula to compute a fitted value. 

If loglin() has fitted a model (see example below) what would be a formula that 
it would use to computer predicted count for, 
say, the cell with S = H, E=H, P=No in a sample that has a total of 4991 
observations?  In other words, how did it arrive at the number 270.01843 in 
the upper left hand corner of $fit? 


I see that loglin() computes exactly the same predictions (fitted values) as 
glm( counts ~ S + E +P + S:E + S:P + E:P, data=wisconsin, family=poisson) see 
below)  but it gives different values of the estimates for parameters. So I 
figure the formula it uses to compute 
the fitted values is not the same as what is used in Poisson 
regression. 

If there is a better way to fit this type of model and provide easy to 
understand and interpret / present coefficient summary, please let me know. 

Just in case, I provided the original data at the very bottom.  
 


YZK



#################### use loglin() ###################################


loglin.3 = loglin(wisconsin.table, 
margin = list( c(1,2), c(1,3), c(2,3) ), fit=T, param=T)
loglin.3
> loglin.3
$lrt
[1] 1.575469

$pearson
[1] 1.572796

$df
[1]
 3

$margin
$margin[[1]]
[1] "S" "E"

$margin[[2]]
[1] "S" "P"

$margin[[3]]
[1] "E" "P"


$fit
, , P = No

    E
S            H         L
  H  270.01843 148.98226
  L  228.85782 753.14127
  LM 331.04036 625.95942
  UM 373.08339 420.91704

, , P = Yes

    E
S            H         L
  H  795.97572  30.02330
  L  137.14648  30.85410
  LM 301.96657  39.03387
  UM 467.91123  36.08873


$param
$param$`(Intercept)`
[1] 5.275394

$param$S
         H         
 L         LM         UM 
-0.1044289 -0.1734756  0.1286741  0.1492304 
#I think this says that we had a lot of S = LM and S= UM kids in our sample and 
relatively few S= L kids

$param$E
        H         L 
 0.501462 -0.501462 
#I think this says that more kids had E=H than E=L
# sum(wisconsin$counts[wisconsin$E=="L"]) [1] 2085
# sum(wisconsin$counts[wisconsin$E=="H"]) [1] 2906

$param$P
        No        Yes 
 0.5827855 -0.5827855 

$param$S.E
    E
S             H          L
  H   0.4666025 -0.4666025  #kids in S=H were
 more likely to get E=H than E=L
  L  -0.4263050  0.4263050  #kids in S=L were more likely to get E=L than 
E=H 
  LM -0.1492516  0.1492516
  UM  0.1089541 -0.1089541

$param$S.P
    P
S             No         Yes
  H  -0.45259177  0.45259177
  L   0.34397315 -0.34397315
  LM  0.13390947 -0.13390947
  UM -0.02529085  0.02529085

$param$E.P
   P
E          No       Yes
  H -0.670733  0.670733  #kids with E=H were more likely to have P=Yes than 
kids with E=L
  L  0.670733 -0.670733


############### use glm () ########################################

summary(glm2)

Call:
glm(formula = counts ~ S + E + P + S:E + S:P + E:P, family = poisson, 
    data = wisconsin)

Deviance Residuals: 
       1         2         3         
4         5         6         7         8  
-0.15119   0.27320   0.04135  -0.05691  -0.04446   0.04719   
0.32807  -0.24539  
       9        10        11        12        
13        14        15        16  
 0.73044  -0.35578  -0.16639   0.05952   0.15116  -0.04217  
-0.75147   0.14245  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  5.59850    0.05886  95.116  < 2e-16 ***
SL          -0.16542    0.08573  -1.930  0.05366 .  
SLM          0.20372    0.07841   2.598  0.00937 ** 
SUM          0.32331    0.07664   4.219 2.46e-05 ***
EL          -0.59471    0.09234  -6.441 1.19e-10 ***
PYes         1.08107    0.06731  16.060  < 2e-16 ***
SL:EL        1.78588    0.11444  15.606  < 2e-16 ***
SLM:EL       1.23178    0.10987  11.211  < 2e-16 ***
SUM:EL       0.71532    0.11136   6.424 1.33e-10 ***
SL:PYes     -1.59311    0.11527 -13.820  < 2e-16 ***
SLM:PYes    -1.17298    0.09803 -11.965  < 2e-16 ***
SUM:PYes    -0.85460    0.09259  -9.230  < 2e-16 ***
EL:PYes     -2.68292    0.09867 -27.191  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ 
’ 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 3211.0014  on 15  degrees of freedom
Residual deviance:    1.5755  on  3  degrees of freedom
AIC: 141.39

################ Original data ############################


#data from Wisconsin that classifies 4991 high school seniors according to
socio-economic status S= (low, lower middle, upper middle, and high), 
# the degree of parental encouragement they receive E= (low and high) 
# and whether or not they have plans to attend college P(no, yes).

#s= social stratum, E=parental encouragement P= college plans

#S= social stratum, E=parental encouragement P= college plans

S=c("L", "L", "LM", "LM", "UM", "UM", "H", "H")
S=c(S,S)

E = rep ( c("L", "H"), 8)

P=  c (rep("No", 8), rep("Yes",8))

counts = c(749, 233, 627, 330, 420, 374, 153, 266,
35,133,38,303,37,467,26,800)




wisconsin = data.frame(S, E, P, counts)

> wisconsin
    S E   P counts
1   L L  No    749
2   L H  No    233
3  LM L  No    627
4  LM H  No    330
5  UM L  No    420
6  UM H  No    374
7   H L  No    153
8   H H  No    266
9   L L Yes     35
10  L H Yes    133
11 LM L Yes     38
12 LM H Yes    303
13 UM L Yes     37
14 UM H Yes    467
15  H L Yes     26
16  H H Yes    800
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to