This is a multi-part message in MIME format. --------------030304040002000407020206 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit
Bug PR#9316 noted an inconsistency between the Cook's distance contours on plot.lm(x, which = 5) and the values given by cooks.distance(x) -- as shown in plot.lm(x, which = 4) -- for glms: http://bugs.r-project.org/cgi-bin/R/Analyses-fixed?id=9316;user=guest;selectid=9316 The suggested fix was to modify the contour levels by a dispersion factor, implemented as follows: dispersion <- if (isGlm) summary(x)$dispersion else 1 cl.h <- sqrt(crit * p * (1 - hh)/hh * dispersion) [1] where crit is the value of Cook's distance along the contour, p is the number of parameters and hh is a vector of hat values. It is clear that in the case of a normal linear model, the cl.h values will depend on whether the model was fitted by lm or glm. The following example illustrates this: par(mfrow = c(2, 2)) plot(lm(y ~ ., data = freeny), 4:5) plot(glm(y ~ ., data = freeny), 4:5) For the lm fit we have crit = (cl.h^2 * hh)/(p * (1 - hh)) where cl.h is on the scale of the standardized residuals. This is the Cook's distance as defined e.g. in the Cook & Weisberg reference on ?cooks.distance. For the glm fit we have crit = (cl.h^2 * hh)/(p * (1 - hh) * summary(x)$dispersion) which is incorrect. For glms, as defined in the William's reference on ?cooks.distance, the Cook's distance is given by C = (rs^2 * hh)/(p * (1 - hh)) where rs is the standardized *Pearson* residual, i.e. rs = residuals(x, "pearson")/sqrt(summary(x)$dispersion * (1 - hh)) However plot 5 of plot.lm plots the standardized *deviance* residuals. Therefore, for non-Gaussian models, contours based on [1], even with "dispersion" = 1, will be on the wrong scale. Since the deviance residuals are usually quite similar to the Pearson residuals, the contours may not be far off, but the difference accounts for the inconsistency reported in PR#9316. The solution is simply to plot the standardised Pearson residuals instead, as in the attached patch. Best regards, Heather -- Dr H Turner Research Fellow Dept. of Statistics The University of Warwick Coventry CV4 7AL Tel: 024 76575870 Fax: 024 76524532 Url: www.warwick.ac.uk/go/heatherturner --------------030304040002000407020206 Content-Type: text/x-patch; name="plot.lm.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="plot.lm.diff" Index: src/library/stats/R/plot.lm.R =================================================================== --- src/library/stats/R/plot.lm.R (revision 45649) +++ src/library/stats/R/plot.lm.R (working copy) @@ -55,7 +55,7 @@ else cooks.distance(x, sd = s, res = r) } } - if (any(show[c(2:3,5)])) { + if (any(show[2:3])) { ylab23 <- if (isGlm) "Std. deviance resid." else "Standardized residuals" @@ -167,7 +167,11 @@ text.id(show.r, cook[show.r], show.r, adj.x=FALSE) } if (show[5]) { - ylim <- range(rs, na.rm = TRUE) + rps <- residuals(x, "pearson")/(s * sqrt(1 - hii)) + ylab5 <- if (isGlm) + "Std. Pearson resid." + else "Standardized residuals" + ylim <- range(rps, na.rm = TRUE) if (id.n > 0) { ylim <- extendrange(r= ylim, f = 0.08) show.r <- order(-cook)[iid] @@ -197,15 +201,15 @@ facval[ord] <- facval xx <- facval # for use in do.plot section. - plot(facval, rs, xlim = c(-1/2, sum((nlev-1) * ff) + 1/2), + plot(facval, rps, xlim = c(-1/2, sum((nlev-1) * ff) + 1/2), ylim = ylim, xaxt = "n", main = main, xlab = "Factor Level Combinations", - ylab = ylab23, type = "n", ...) + ylab = ylab5, type = "n", ...) axis(1, at = ff[1]*(1:nlev[1] - 1/2) - 1/2, labels= x$xlevels[[1]][order(sapply(split(yh,mf[,1]), mean))]) mtext(paste(facvars[1],":"), side = 1, line = 0.25, adj=-.05) abline(v = ff[1]*(0:nlev[1]) - 1/2, col="gray", lty="F4") - panel(facval, rs, ...) + panel(facval, rps, ...) abline(h = 0, lty = 3, col = "gray") } else { # no factors @@ -221,21 +225,20 @@ ## omit hatvalues of 1. xx[xx >= 1] <- NA - plot(xx, rs, xlim = c(0, max(xx, na.rm = TRUE)), ylim = ylim, - main = main, xlab = "Leverage", ylab = ylab23, type = "n", + plot(xx, rps, xlim = c(0, max(xx, na.rm = TRUE)), ylim = ylim, + main = main, xlab = "Leverage", ylab = ylab5, type = "n", ...) - panel(xx, rs, ...) + panel(xx, rps, ...) abline(h = 0, v = 0, lty = 3, col = "gray") if (one.fig) title(sub = sub.caption, ...) if(length(cook.levels)) { - dispersion <- if(isGlm) summary(x)$dispersion else 1 p <- length(coef(x)) usr <- par("usr") hh <- seq.int(min(r.hat[1], r.hat[2]/100), usr[2], length.out = 101) for(crit in cook.levels) { - cl.h <- sqrt(crit*p*(1-hh)/hh * dispersion) + cl.h <- sqrt(crit*p*(1-hh)/hh) lines(hh, cl.h, lty = 2, col = 2) lines(hh,-cl.h, lty = 2, col = 2) } @@ -254,7 +257,7 @@ if (do.plot) { mtext(caption[5], 3, 0.25, cex = cex.caption) if (id.n > 0) { - y.id <- rs[show.r] + y.id <- rps[show.r] y.id[y.id < 0] <- y.id[y.id < 0] - strheight(" ")/3 text.id(xx[show.r], y.id, show.r) } --------------030304040002000407020206-- ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel