[R] glm with variance = mu+theta*mu^2?
You can fit negative binomial using the 'zicounts' package library(zicounts) data(teeth) names(teeth) ## c) fit negative binomial regression model nb.zc <- zicounts(resp = dmft~.,x =~gender + age,data=teeth, distr = "NB") nb.zc Even, library(zicounts) library(Fahrmeir) # use cells data data(cells) nb.cells <- zicounts(parm=c(2,0,0,0,1),resp = y~.,x =~TNF+IFN+TNF:IFN,data=cells, distr = "NB") nb.cells Samuel. Kjetil Brinchmann Halvorsen <[EMAIL PROTECTED]> wrote: Spencer Graves wrote: > How might you fit a generalized linear model (glm) with variance > = mu+theta*mu^2 (where mu = mean of the exponential family random > variable and theta is a parameter to be estimated)? > > This appears in Table 2.7 of Fahrmeir and Tutz (2001) > Multivariate Statisticial Modeling Based on Generalized Linear Models, > 2nd ed. (Springer, p. 60), where they compare "log-linear model fits > to cellular differentiation data based on quasi-likelihoods" between > variance = phi*mu (quasi-Poisson), variance = phi*mu^2 > (quasi-exponential), and variance = mu+theta*mu^2. The "quasi" > function accepted for the family argument in "glm" generates functions > "variance", "validmu", and "dev.resids". I can probably write > functions to mimic the "quasi" function. However, I have two > questions in regard to this: > > (1) I don't know what to use for "dev.resids". This may not > matter for fitting. I can try a couple of different things to see if > it matters. > > (2) Might someone else suggest something different, e.g., using > something like optim to solve an appropriate quasi-score function? > > Thanks, > spencer graves > Since nobody has answerd this I will try. The variance function mu+theta*mu^2 is the variance function of the negative binomial family. If this variance function is used to construct a quasi-likelihood, the resulting quasi- likelihood is identical to the negative binomial likelihood, so for fitting we can simly use glm.nb from MASS, which will give the correct estimated values. However, in a quasi-likelihood setting the (co)varince estimation from glm.nb is not appropriate, and from the book (fahrmeir ..) it seems that the estimation method used is a sandwich estimator, so we can try the sandwich package. This works but the numerical results are somewhat different from the book. Any comments on this? my code follows: > library(Fahrmeir) > library(help=Fahrmeir) > library(MASS) > cells.negbin <- glm(y~TNF+IFN+TNF:IFN, data=cells, family=negative.binomial(1/0.215)) > summary(cells.negbin) Call: glm(formula = y ~ TNF + IFN + TNF:IFN, family = negative.binomial(1/0.215), data = cells) Deviance Residuals: Min 1Q Median 3Q Max -1.6714 -0.8301 -0.2153 0.4802 1.4282 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.39874495 0.18791125 18.087 4.5e-10 *** TNF 0.01616136 0.00360569 4.482 0.00075 *** IFN 0.00935690 0.00359010 2.606 0.02296 * TNF:IFN -0.5910 0.7002 -0.844 0.41515 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for Negative Binomial(4.6512) family taken to be 1.012271) Null deviance: 46.156 on 15 degrees of freedom Residual deviance: 12.661 on 12 degrees of freedom AIC: 155.49 Number of Fisher Scoring iterations: 5 > confint(cells.negbin) Waiting for profiling to be done... 2.5 % 97.5 % (Intercept) 3.0383197319 3.7890206510 TNF 0.0091335087 0.0238915483 IFN 0.0023292566 0.0170195707 TNF:IFN -0.0001996824 0.960427 > library(sandwich) Loading required package: zoo > vcovHC( cells.negbin ) (Intercept) TNF IFN TNF:IFN (Intercept) 0.01176249372 -0.0001279740135 -0.0001488223001 0.0212541999 TNF -0.00012797401 0.039017282 0.021242875 -0.0019793137 IFN -0.00014882230 0.021242875 0.054314079 -0.0013277626 TNF:IFN 0.0212542 -0.001979314 -0.001327763 0.0002370104 > cov2cor(vcovHC( cells.negbin )) (Intercept) TNF IFN TNF:IFN (Intercept) 1.000 -0.5973702 -0.5887923 0.1272950 TNF -0.5973702 1.000 0.4614542 -0.6508822 IFN -0.5887923 0.4614542 1.000 -0.3700671 TNF:IFN 0.1272950 -0.6508822 -0.3700671 1.000 > cells.negbin2 <- glm.nb( y~TNF+IFN+TNF:IFN, data=cells) > summary(cells.negbin) Call: glm(formula = y ~ TNF + IFN + TNF:IFN, family = negative.binomial(1/0.215), data = cells) Deviance Residuals: Min 1Q Median 3Q Max -1.6714 -0.8301 -0.2153 0.4802 1.4282 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.39874495 0.18791125 18.087 4.5e-10 *** TNF 0.01616136 0.00360569 4.482 0.00075 *** IFN 0.00935690 0.00359010 2.606 0.02296 * TNF:IFN -0.5910 0.7002 -0.844 0.41515 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for Negative Binomial(4.6512) family taken to be 1.012271) Null deviance: 46.156 on 15 degrees of freedom Residual deviance: 12.661 on 12 degrees of freedom AIC: 155.49 Number of Fisher Scoring iterations: 5 > confint( cells.negbin2 ) Waiting for prof
Re: [R] glm with variance = mu+theta*mu^2?
Spencer Graves wrote: > How might you fit a generalized linear model (glm) with variance > = mu+theta*mu^2 (where mu = mean of the exponential family random > variable and theta is a parameter to be estimated)? > > This appears in Table 2.7 of Fahrmeir and Tutz (2001) > Multivariate Statisticial Modeling Based on Generalized Linear Models, > 2nd ed. (Springer, p. 60), where they compare "log-linear model fits > to cellular differentiation data based on quasi-likelihoods" between > variance = phi*mu (quasi-Poisson), variance = phi*mu^2 > (quasi-exponential), and variance = mu+theta*mu^2. The "quasi" > function accepted for the family argument in "glm" generates functions > "variance", "validmu", and "dev.resids". I can probably write > functions to mimic the "quasi" function. However, I have two > questions in regard to this: > > (1) I don't know what to use for "dev.resids". This may not > matter for fitting. I can try a couple of different things to see if > it matters. > > (2) Might someone else suggest something different, e.g., using > something like optim to solve an appropriate quasi-score function? > > Thanks, > spencer graves > Since nobody has answerd this I will try. The variance function mu+theta*mu^2 is the variance function of the negative binomial family. If this variance function is used to construct a quasi-likelihood, the resulting quasi- likelihood is identical to the negative binomial likelihood, so for fitting we can simly use glm.nb from MASS, which will give the correct estimated values. However, in a quasi-likelihood setting the (co)varince estimation from glm.nb is not appropriate, and from the book (fahrmeir ..) it seems that the estimation method used is a sandwich estimator, so we can try the sandwich package. This works but the numerical results are somewhat different from the book. Any comments on this? my code follows: > library(Fahrmeir) > library(help=Fahrmeir) > library(MASS) > cells.negbin <- glm(y~TNF+IFN+TNF:IFN, data=cells, family=negative.binomial(1/0.215)) > summary(cells.negbin) Call: glm(formula = y ~ TNF + IFN + TNF:IFN, family = negative.binomial(1/0.215), data = cells) Deviance Residuals: Min 1Q Median 3Q Max -1.6714 -0.8301 -0.2153 0.4802 1.4282 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.39874495 0.18791125 18.087 4.5e-10 *** TNF 0.01616136 0.00360569 4.482 0.00075 *** IFN 0.00935690 0.00359010 2.606 0.02296 * TNF:IFN -0.5910 0.7002 -0.844 0.41515 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for Negative Binomial(4.6512) family taken to be 1.012271) Null deviance: 46.156 on 15 degrees of freedom Residual deviance: 12.661 on 12 degrees of freedom AIC: 155.49 Number of Fisher Scoring iterations: 5 > confint(cells.negbin) Waiting for profiling to be done... 2.5 % 97.5 % (Intercept) 3.0383197319 3.7890206510 TNF 0.0091335087 0.0238915483 IFN 0.0023292566 0.0170195707 TNF:IFN -0.0001996824 0.960427 > library(sandwich) Loading required package: zoo > vcovHC( cells.negbin ) (Intercept) TNF IFN TNF:IFN (Intercept) 0.01176249372 -0.0001279740135 -0.0001488223001 0.0212541999 TNF -0.00012797401 0.039017282 0.021242875 -0.0019793137 IFN -0.00014882230 0.021242875 0.054314079 -0.0013277626 TNF:IFN 0.0212542 -0.001979314 -0.001327763 0.0002370104 > cov2cor(vcovHC( cells.negbin )) (Intercept)TNFIFNTNF:IFN (Intercept) 1.000 -0.5973702 -0.5887923 0.1272950 TNF -0.5973702 1.000 0.4614542 -0.6508822 IFN -0.5887923 0.4614542 1.000 -0.3700671 TNF:IFN 0.1272950 -0.6508822 -0.3700671 1.000 > cells.negbin2 <- glm.nb( y~TNF+IFN+TNF:IFN, data=cells) > summary(cells.negbin) Call: glm(formula = y ~ TNF + IFN + TNF:IFN, family = negative.binomial(1/0.215), data = cells) Deviance Residuals: Min 1Q Median 3Q Max -1.6714 -0.8301 -0.2153 0.4802 1.4282 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.39874495 0.18791125 18.087 4.5e-10 *** TNF 0.01616136 0.00360569 4.482 0.00075 *** IFN 0.00935690 0.00359010 2.606 0.02296 * TNF:IFN -0.5910 0.7002 -0.844 0.41515 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for Negative Binomial(4.6512) family taken to be 1.012271) Null deviance: 46.156 on 15 degrees of freedom Residual deviance: 12.661 on 12 degrees of freedom AIC: 155.49 Number of Fisher Scoring iterations: 5 > confint( cells.negbin2 ) Waiting for profiling to be done...
[R] glm with variance = mu+theta*mu^2?
How might you fit a generalized linear model (glm) with variance = mu+theta*mu^2 (where mu = mean of the exponential family random variable and theta is a parameter to be estimated)? This appears in Table 2.7 of Fahrmeir and Tutz (2001) Multivariate Statisticial Modeling Based on Generalized Linear Models, 2nd ed. (Springer, p. 60), where they compare "log-linear model fits to cellular differentiation data based on quasi-likelihoods" between variance = phi*mu (quasi-Poisson), variance = phi*mu^2 (quasi-exponential), and variance = mu+theta*mu^2. The "quasi" function accepted for the family argument in "glm" generates functions "variance", "validmu", and "dev.resids". I can probably write functions to mimic the "quasi" function. However, I have two questions in regard to this: (1) I don't know what to use for "dev.resids". This may not matter for fitting. I can try a couple of different things to see if it matters. (2) Might someone else suggest something different, e.g., using something like optim to solve an appropriate quasi-score function? Thanks, spencer graves __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html