Thank you, John, for Dear >>>>> "JohnF" == John Fox <[EMAIL PROTECTED]> >>>>> on Tue, 13 Sep 2005 16:41:28 -0400 writes:
JohnF> A couple of comments on the new plots (numbers 5 and 6): JohnF> Perhaps some more thought could be given to the JohnF> plotted contours for Cook's D (which are 0.5 and 1.0 JohnF> in the example -- large Cook's Ds). A rule-of-thumb JohnF> cut-off for this example is 4/(n - p) = 4/(50 - 5) = JohnF> 0.089, and the discrepancy will grow with n. That's an interesting suggestion. Where does the 4/(n-p) come from? or put differently, should I better read in of your books? ;-) Honestly, I'm so much a fan of R_i / h_ii that I didn't even know that. JohnF> I'm not terribly fond of number 6, since it seems JohnF> natural to me to think of the relationship among JohnF> these quantities as influence on coefficients = JohnF> leverage * outlyingness (which corresponds to 5); JohnF> also note how in the example, the labels for large JohnF> residuals overplot. I think John mainly proposed '6' because other proposed it as another good alternative. From the few examples I've looked at, I haven't got fond at all either. JohnF> Finally, your remarks about balanced data are cogent JohnF> and suggest going with 1:3 in this case (since R_i JohnF> vs. i is pretty redundant with the QQ plot). Ah, that's another, maybe better alternative to my proposal. One drawback of it is for situations where people do something like par(mfrow=c(2,2)) before calling plot(<lm>) for several fitted lm models, assuming to fill one page for each of the plots. and I think that's something I would have done always in such situations where several different models are fitted and compared. Maybe plot.lm() should "advance an empty frame" as soon as prod(par("mfrow")) >= 4 in that case? Martin JohnF> -------------------------------- JohnF> John Fox JohnF> Department of Sociology JohnF> McMaster University JohnF> Hamilton, Ontario JohnF> Canada L8S 4M4 JohnF> 905-525-9140x23604 JohnF> http://socserv.mcmaster.ca/jfox JohnF> -------------------------------- >> -----Original Message----- >> From: Martin Maechler [mailto:[EMAIL PROTECTED] >> Sent: Tuesday, September 13, 2005 9:18 AM >> To: R-devel@stat.math.ethz.ch >> Cc: John Maindonald; Werner Stahel; John Fox >> Subject: plot(<lm>): new behavior in R-2.2.0 alpha >> >> As some of you R-devel readers may know, the plot() method >> for "lm" objects is based in large parts on contributions by >> John Maindonald, subsequently "massaged" by me and other >> R-core members. >> >> In the statistics litterature on applied regression, people >> have had diverse oppinions on what (and how many!) plots >> should be used for goodness-of-fit / residual diagnostics, >> and to my knowledge most people have agreed to want to see >> one (or more) version of a Tukey-Anscombe plot {Residuals ~ >> Fitted} and a QQ normal plot. >> Another consideration was to be somewhat close to what S >> (S-plus) was doing. So we have two versions of residuals vs >> fitted, one for checking E[error] = 0, the other for >> checking Var[error] = constant. So we got to the first three plots of >> plot.lm() about which I don't want to debate at the moment >> {though, there's room for improvement even there: e.g., I >> know of at least one case where plot(<lm>) wasn't used >> because the user was missing the qqline() she was so used to >> in the QQ plot} >> >> The topic of this e-mail is the (default) 4th plot which I >> had changed; really prompted by the following: >> More than three months ago, John wrote >> http://tolstoy.newcastle.edu.au/R/devel/05/04/0594.html >> (which became a thread of about 20 messages, from Apr.23 >> -- 29, 2005) >> >> and currently, >> NEWS for R 2.2.0 alpha contains >> >> >> USER-VISIBLE CHANGES >> >> >> >> o plot(<lm object>) uses a new default for the fourth panel when >> >> 'which' is not specified. >> >> ___ may change before release ___ >> >> and the header is >> >> plot.lm <- >> function (x, which = c(1:3, 5), >> caption = c("Residuals vs Fitted", >> "Normal Q-Q", "Scale-Location", >> "Cook's distance", "Residuals vs Leverage", >> "Cook's distance vs Leverage"), >> ......... ) {..............} >> >> So we now have 6 possible plots, where 1,2,3 and 5 are the >> defaults (and 1,2,3,4 where the old defaults). >> >> For the influential points and combination of 'influential' >> and 'outlier' >> there have been quite a few more proposals in the past. R <= >> 2.1.x has been plotting the Cook's distances vs. observation >> number, whereas quite a few people in the past have noted >> that all influence measures being more or less complicated >> functions of residuals and "hat values" aka "leverages", >> (R_i, h_{ii}), it would really make sense and fit more to the >> other plots to plot residuals vs. Leverages --- with the >> additional idea of adding *contours* of (equal) Cook's >> distances to that plot, in case one would really want to seem them. >> >> In the mean time, this has been *active* in R-devel for quite >> a while, and we haven't received any new comments. >> >> One remaining problem I'd like to address is the "balanced AOV" >> situation, something probably pretty rare nowadays in real >> practice, but common of course in teaching ANOVA. >> As you may remember, in a balanced design, all observations >> have the same leverages h_{ii}, and the plot R_i vs h_ii >> is really not so useful. In that case, the cook distances >> CD_i = c * R_i ^2 and so CD_i vs i {the old "4-th plot in >> plot.lm"} is >> graphically identical to R_i^2 vs i. >> Now in that case (of identical h_ii's), I think one would >> really want "R_i vs i". >> >> Question to the interested parties: >> >> Should there be an automatism >> ``when h_ii == const'' {"==" with a bit of numerical fuzz} >> plot a) R_i vs i >> or b) CD_i vs i >> >> or should users have to manually use >> plot(<lm>, which=1:4, ...) >> in such a case? >> >> Feedback very welcome, >> particularly, you first look at the examples in help(plot.lm) >> in *R-devel* aka R-2.2.0 alpha. >> >> Martin Maechler, ETH Zurich ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel