Dear R gurus, I am looking for a way to fit a predictive model for a contingency table which has counts. I found that glm( family=poisson) is very good for figuring out which of several alternative models I should select. But once I select a model it is hard to present and interpret it, especially when it has interactions, because everything is done "relative to reference cell". This makes it confusing for the audience.
I found that loglin() gives what might be much easier to interpret output as far as coefficients estimates are concerned because they are laid out in a nice table and are provided for all the values of all the factors. But I need to be able to explain what the coefficients really mean. For that, I need to understand how they are used in a formula to compute a fitted value. If loglin() has fitted a model (see example below) what would be a formula that it would use to computer predicted count for, say, the cell with S = H, E=H, P=No in a sample that has a total of 4991 observations? In other words, how did it arrive at the number 270.01843 in the upper left hand corner of $fit? I see that loglin() computes exactly the same predictions (fitted values) as glm( counts ~ S + E +P + S:E + S:P + E:P, data=wisconsin, family=poisson) see below) but it gives different values of the estimates for parameters. So I figure the formula it uses to compute the fitted values is not the same as what is used in Poisson regression. If there is a better way to fit this type of model and provide easy to understand and interpret / present coefficient summary, please let me know. Just in case, I provided the original data at the very bottom.  YZK #################### use loglin() ################################### loglin.3 = loglin(wisconsin.table, margin = list( c(1,2), c(1,3), c(2,3) ), fit=T, param=T) loglin.3 > loglin.3 $lrt [1] 1.575469 $pearson [1] 1.572796 $df [1] 3 $margin $margin[[1]] [1] "S" "E" $margin[[2]] [1] "S" "P" $margin[[3]] [1] "E" "P" $fit , , P = No    E S           H        L  H 270.01843 148.98226  L 228.85782 753.14127  LM 331.04036 625.95942  UM 373.08339 420.91704 , , P = Yes    E S           H        L  H 795.97572 30.02330  L 137.14648 30.85410  LM 301.96657 39.03387  UM 467.91123 36.08873 $param $param$`(Intercept)` [1] 5.275394 $param$S         H         L        LM        UM -0.1044289 -0.1734756 0.1286741 0.1492304 #I think this says that we had a lot of S = LM and S= UM kids in our sample and relatively few S= L kids $param$E        H        L  0.501462 -0.501462 #I think this says that more kids had E=H than E=L # sum(wisconsin$counts[wisconsin$E=="L"]) [1] 2085 # sum(wisconsin$counts[wisconsin$E=="H"]) [1] 2906 $param$P        No       Yes  0.5827855 -0.5827855 $param$S.E    E S            H         L  H  0.4666025 -0.4666025 #kids in S=H were more likely to get E=H than E=L  L -0.4263050 0.4263050 #kids in S=L were more likely to get E=L than E=H  LM -0.1492516 0.1492516  UM 0.1089541 -0.1089541 $param$S.P    P S            No        Yes  H -0.45259177 0.45259177  L  0.34397315 -0.34397315  LM 0.13390947 -0.13390947  UM -0.02529085 0.02529085 $param$E.P   P E         No      Yes  H -0.670733 0.670733 #kids with E=H were more likely to have P=Yes than kids with E=L  L 0.670733 -0.670733 ############### use glm () ######################################## summary(glm2) Call: glm(formula = counts ~ S + E + P + S:E + S:P + E:P, family = poisson,    data = wisconsin) Deviance Residuals:       1        2        3        4        5        6        7        8 -0.15119  0.27320  0.04135 -0.05691 -0.04446  0.04719  0.32807 -0.24539       9       10       11       12       13       14       15       16  0.73044 -0.35578 -0.16639  0.05952  0.15116 -0.04217 -0.75147  0.14245 Coefficients:            Estimate Std. Error z value Pr(>|z|)   (Intercept) 5.59850   0.05886 95.116 < 2e-16 *** SL         -0.16542   0.08573 -1.930 0.05366 . SLM         0.20372   0.07841  2.598 0.00937 ** SUM         0.32331   0.07664  4.219 2.46e-05 *** EL         -0.59471   0.09234 -6.441 1.19e-10 *** PYes        1.08107   0.06731 16.060 < 2e-16 *** SL:EL       1.78588   0.11444 15.606 < 2e-16 *** SLM:EL      1.23178   0.10987 11.211 < 2e-16 *** SUM:EL      0.71532   0.11136  6.424 1.33e-10 *** SL:PYes    -1.59311   0.11527 -13.820 < 2e-16 *** SLM:PYes   -1.17298   0.09803 -11.965 < 2e-16 *** SUM:PYes   -0.85460   0.09259 -9.230 < 2e-16 *** EL:PYes    -2.68292   0.09867 -27.191 < 2e-16 *** --- Signif. codes: 0 â***â 0.001 â**â 0.01 â*â 0.05 â.â 0.1 â â 1 (Dispersion parameter for poisson family taken to be 1)    Null deviance: 3211.0014 on 15 degrees of freedom Residual deviance:   1.5755 on 3 degrees of freedom AIC: 141.39 ################ Original data ############################ #data from Wisconsin that classifies 4991 high school seniors according to socio-economic status S= (low, lower middle, upper middle, and high), # the degree of parental encouragement they receive E= (low and high) # and whether or not they have plans to attend college P(no, yes). #s= social stratum, E=parental encouragement P= college plans #S= social stratum, E=parental encouragement P= college plans S=c("L", "L", "LM", "LM", "UM", "UM", "H", "H") S=c(S,S) E = rep ( c("L", "H"), 8) P= c (rep("No", 8), rep("Yes",8)) counts = c(749, 233, 627, 330, 420, 374, 153, 266, 35,133,38,303,37,467,26,800) wisconsin = data.frame(S, E, P, counts) > wisconsin    S E  P counts 1  L L No   749 2  L H No   233 3 LM L No   627 4 LM H No   330 5 UM L No   420 6 UM H No   374 7  H L No   153 8  H H No   266 9  L L Yes    35 10 L H Yes   133 11 LM L Yes    38 12 LM H Yes   303 13 UM L Yes    37 14 UM H Yes   467 15 H L Yes    26 16 H H Yes   800 [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.