Dear Peter, > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Peter > Muhlberger > Sent: Wednesday, January 04, 2006 2:43 PM > To: rhelp > Subject: [R] A comment about R: >
. . . > Ex. 1) Wald tests of linear hypotheses after max. likelihood > or even after a regression. "Wald" does not even appear in > my standard R package on a search. There's no comment in the > lm help or optim help about what function to use for > hypothesis tests. I know that statisticians prefer > likelihood ratio tests, but Wald tests are still useful and > indeed crucial for first-pass analysis. After searching with > Google for some time, I found several Wald functions in > various contributed R packages I did not have installed. One > confusion was which one would be relevant to my needs. This > took some time to resolve. I concluded, perhaps on > insufficient evidence, that package car's Wald test would be > most helpful. To use it, however, one has to put together a > matrix for the hypotheses, which can be arduous for a > many-term regression or a complex hypothesis. > In comparison, > in Stata one simply states the hypothesis in symbolic terms. > I also don't know for certain that this function in car will > work or work properly w/ various kinds of output, say from lm > or from optim. To be sure, I'd need to run time-consuming > tests comparing it with Stata output or examine the > function's code. In Stata the test is easy to find, and > there's no uncertainty about where it can be run or its > accuracy. Simply having a comment or "see also" in lm help > or mle or optim help pointing the user to the right Wald > function would be of enormous help. > The reference, I believe, is to the linear.hypothesis() function, which has methods for lm and glm objects. [To see what kinds of objects linear.hypothesis is suitable for, use the command methods(linear.hypothesis).] For lm objects, you get an F-test by default. Note that the Anova() function, also in car, can more conveniently compute Wald tests for certain kinds of hypotheses. More generally, however, I'd be interested in your suggestions for an alternative method of specifying linear hypotheses. There is currently no method for mle objects, but adding one is a good idea, and I'll do that when I have a chance. (In the meantime, it's very easy to compute Wald tests from the coefficients and the hypothesis and coefficient-covariance matrices. Writing a small function to do so, without the bells and whistles of something like linear.hypothesis(), should not be hard. Indeed, the ability to do this kind of thing easily is what I see as the primary advantage of working in a statistical computing environment like R -- or Stata. > Ex. 2) Getting neat output of a regression with Huberized > variance matrix. > I frequently have to run regressions w/ robust variances. In > Stata, one simply adds the word "robust" to the end of the > command or "cluster(cluster.variable)" for a cluster-robust > error. In R, there are two functions, robcov and hccm. I > had to run tests to figure out what the relationship is > between them and between them and Stata (robcov w/o cluster > gives hccm's hc0; hccm's hc1 is equivalent to Stata's > 'robust' w/o cluster; etc.). A single sentence in hccm's > help saying something to the effect that statisticians prefer > hc3 for most types of data might save me from having to > scramble through the statistical literature to try to figure > out which of these I should be using. A few sentences on > what the differences are between these methods would be even > better. Then, there's the problem of output. Given that hc1 > or hc3 are preferred for non-clustered data, I'd need to be > able to get regression output of the form summary(lm) out of > hccm, for any practical use. Getting this, however, would > require programming my own function. Huberized t-stats for > regressions are commonplace needs, an R oriented a little > toward more everyday needs would not require programming of > such needs. Also, I'm not sure yet how well any of the > existing functions handle missing data. > I think that we have a philosophical difference here: I don't like giving advice in documentation. An egregious extended example of this, in my opinion, is the SPSS documentation. The hccm() function uses hc3 as the default, which is an implicit recommendation, but more usefully, in my view, points to Long and Erwin's American Statistician paper on the subject, which does give advice and which is quite accessible. As well, and more generally, the car package is associated with a book (my R and S-PLUS Companion to Applied Regression), which gives advice, though, admittedly, tersely in this case. The Anova() function with argument white=TRUE will give you F-tests corresponding to the t-tests to which you refer (though it will combine df for multiple-df terms in the model). To get the kind of summary you describe, you could use something like mysummary <- function(model){ coef <- coef(model) se <- sqrt(diag(hccm(model))) t <- coef/se p <- 2*pt(abs(t), df=model$df.residual, lower=FALSE) table <- cbind(coef, se, t, p) rownames(table) <- names(coef) colnames(table) <- c("Estimate", "Std. Error", "t value", "Pr(>|t|)") table } Again, it's not time-consuming to write simple functions like this for one's own use, and the ability to do so is a strength of R, in my view. I'm not sure what you mean about handling missing data: functions like hccm(), linear.hypothesis(), and Anova() start with a model object for which missing data have already been handled. Regards, John ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html