Dear Bob, I think that you make a valid point -- and I've included "Type-III" tests in the Anova() function in the car package, for example, though not entirely for consistency with SAS -- but my object in writing the effects package was different. I wanted to provide a general approach to visualizing terms in complex models with linear predictors. If I can with reasonable effort provide a solution that's the same as "least-squares means" for models for which "least-squares means" are defined then I'll do so at some point, but duplicating what SAS does was not my goal.
Regards, John -------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -------------------------------- > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Muenchen, Robert A (Bob) > Sent: Thursday, March 22, 2007 9:36 AM > To: R-help@stat.math.ethz.ch > Subject: Re: [R] how to get "lsmeans"? > > Hi All, > > Perhaps I'm stating the obvious, but to increase the use of R > in places where SAS & SPSS dominate, it's important to make > getting the same answers as easy as possible. That includes > things like lsmeans and type III sums of squares. I've read > lots of discussions here on sums of squares & I'm not > advocating type III use, just looking at it from a marketing > perspective. Too many people look for excuses to not change. > The fewer excuses, the better. > > Of course this is easy for me to say, as I'm not the one who > does the work! Much thanks to those who do. > > Cheers, > Bob > > ========================================================= > Bob Muenchen (pronounced Min'-chen), Manager > Statistical Consulting Center > U of TN Office of Information Technology > 200 Stokely Management Center, Knoxville, TN 37996-0520 > Voice: (865) 974-5230 > FAX: (865) 974-4810 > Email: [EMAIL PROTECTED] > Web: http://oit.utk.edu/scc, > News: http://listserv.utk.edu/archives/statnews.html > ========================================================= > > > -----Original Message----- > > From: [EMAIL PROTECTED] [mailto:r-help- > > [EMAIL PROTECTED] On Behalf Of John Fox > > Sent: Wednesday, March 21, 2007 8:59 PM > > To: 'Prof Brian Ripley' > > Cc: 'r-help'; 'Chuck Cleland' > > Subject: Re: [R] how to get "lsmeans"? > > > > Dear Brian et al., > > > > My apologies for chiming in late: It's been a busy day. > > > > First some general comments on "least-squares means" and "effect > > displays." > > The general idea behind the two is similar -- to examine > fitted values > > corresponding to a term in a model while holding other terms to > typical > > values -- but the implementation is not identical. There are also > other > > similar ideas floating around as well. My formulation is > more general > > in the sense that it applies to a wider variety of models, > both linear > > and otherwise. > > > > "Least-squares means" (a horrible term, by the way: in a > 1980 paper in > > the American Statistician, Searle, Speed, and Milliken > suggested the > > more descriptive term "population marginal means") apply to factors > > and combinations of factors; covariates are set to mean > values and the > > levels of other factors are averaged over, in effect applying equal > > weight to each level. (This is from memory, so it's > possible that I'm > > not getting it quite right, but I believe that I am.) In my effect > > displays, each level of > a > > factor is weighted by its proportion in the data. In models > in which > > least-squares means can be computed, they should differ from the > > corresponding effect display by a constant (if there are different > > numbers of observations in the different levels of the factors that > > are held constant). > > > > The obstacle to computing either least-squares means or effect > displays > > in R > > via predict() is that predict() wants factors in the "new > data" to be > > set to particular levels. The effect() function in the > effects package > > bypasses > > predict() and works directly with the model matrix, > averaging over the > > columns that pertain to a factor (and reconstructing > interactions as > > necessary). As mentioned, this has the effect of setting > the factor to > > its proportional distribution in the data. This approach > also has the > > advantage of being invariant with respect to the choice of > contrasts > > for a factor. > > > > The only convenient way that I can think of to implement > least-squares > > means in R would be to use deviation-coded regressors for a factor > > (that is, > > contr.sum) and then to set the columns of the model matrix for the > > factor(s) > > to be averaged over to 0. It may just be that I'm having a > failure of > > imagination and that there's a better way to proceed. I've not > > implemented this solution because it is dependent upon the > choice of > > contrasts and because I don't see a general advantage to > it, but since > > the issue has come up several times now, maybe I should > take a crack > > at it. Remember that I want this to work more generally, > not just for > > levels of factors, and not just for linear models. > > > > Brian is quite right in mentioning that he suggested some time ago > that > > I > > use critical values of t rather than of the standard normal > > distribution for producing confidence intervals, and I > agree that it > > makes sense to do so in models in which the dispersion is > estimated. > > My only excuse for not > yet > > doing this is that I want to undertake a more general > revision of the > > effects package, and haven't had time to do it. There are several > > changes that I'd like to make to the package. For example, I have > > results for multinomial and proportional odds logit models > (described > > in a paper > by > > me > > and Bob Andersen in the 2006 issue of Sociological > Methodology) that I > > want to incorporate, and I'd like to improve the appearance of the > > default graphs. But Brian's suggestion is very > straightforward, and I > > guess that I shouldn't wait to implement it; I'll do so very soon. > > > > Regards, > > John > > > > -------------------------------- > > John Fox > > Department of Sociology > > McMaster University > > Hamilton, Ontario > > Canada L8S 4M4 > > 905-525-9140x23604 > > http://socserv.mcmaster.ca/jfox > > -------------------------------- > > > > > -----Original Message----- > > > From: [EMAIL PROTECTED] > > > [mailto:[EMAIL PROTECTED] On Behalf Of Prof Brian > > > Ripley > > > Sent: Wednesday, March 21, 2007 12:03 PM > > > To: Chuck Cleland > > > Cc: r-help > > > Subject: Re: [R] how to get "lsmeans"? > > > > > > On Wed, 21 Mar 2007, Chuck Cleland wrote: > > > > > > > Liaw, Andy wrote: > > > >> I verified the result from the following with output from JMP 6 > on > > > >> the same data (don't have SAS: don't need it): > > > >> > > > >> set.seed(631) > > > >> n <- 100 > > > >> dat <- data.frame(y=rnorm(n), A=factor(sample(1:2, n, > > > replace=TRUE)), > > > >> B=factor(sample(1:2, n, replace=TRUE)), > > > >> C=factor(sample(1:2, n, replace=TRUE)), > > > >> d=rnorm(n)) > > > >> fm <- lm(y ~ A + B + C + d, dat) > > > >> ## Form a data frame of points to predict: all > > > combinations of the ## > > > >> three factors and the mean of the covariate. > > > >> p <- data.frame(expand.grid(A=1:2, B=1:2, C=1:2)) p[] <- > lapply(p, > > > >> factor) p <- cbind(p, d=mean(dat$d)) p <- > > > cbind(yhat=predict(fm, p), > > > >> p) ## lsmeans for the three factors: > > > >> with(p, tapply(yhat, A, mean)) > > > >> with(p, tapply(yhat, B, mean)) > > > >> with(p, tapply(yhat, C, mean)) > > > > > > > > Using Andy's example data, these are the LSMEANS and > > > intervals I get > > > > from SAS: > > > > > > > > A y LSMEAN 95% Confidence Limits > > > > 1 -0.071847 -0.387507 0.243813 > > > > 2 -0.029621 -0.342358 0.283117 > > > > > > > > B y LSMEAN 95% Confidence Limits > > > > 1 -0.104859 -0.397935 0.188216 > > > > 2 0.003391 -0.333476 0.340258 > > > > > > > > C y LSMEAN 95% Confidence Limits > > > > 1 -0.084679 -0.392343 0.222986 > > > > 2 -0.016789 -0.336374 0.302795 > > > > > > > > One way of reproducing the LSMEANS and intervals from SAS using > > > > predict() seems to be the following: > > > > > > > >> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d, > > > data = dat) > > > >> newdat <- > expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d)) > > > >> cbind(newdat, predict(dat.lm, newdat, interval="confidence")) > > > > A B C d fit lwr upr > > > > 1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128 > > > > 2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165 > > > > > > > > However, another possibility seems to be: > > > > > > > >> dat.lm <- lm(y ~ A + as.numeric(B) + as.numeric(C) + d, > > > data = dat) > > > >> newdat <- > > > > > > > > > > expand.grid(A=factor(c(1,2)),B=mean(as.numeric(dat$B)),C=mean(as.numer > > > > ic(dat$C)),d=mean(dat$d)) > > > >> cbind(newdat, predict(dat.lm, newdat, interval="confidence")) > > > > A B C d fit lwr upr > > > > 1 1 1.43 1.48 0.09838595 -0.08078243 -0.3964661 0.2349012 > > > > 2 2 1.43 1.48 0.09838595 -0.03855619 -0.3479589 0.2708465 > > > > > > > > The predictions directly above match what effect() in > the effects > > > > package by John Fox returns: > > > > > > > > library(effects) > > > > > > > >> effect("A", fm, xlevels=list(d = mean(dat$D))) > > > > > > > > A effect > > > > A > > > > 1 2 > > > > -0.08078243 -0.03855619 > > > > > > > > But for some reason the predict() and effect() intervals > > > are a little > > > > different: > > > > > > > >> effect("A", fm, xlevels=list(d = mean(dat$D)))$lower > > > > [,1] > > > > 101 -0.3924451 > > > > 102 -0.3440179 > > > > > > > >> effect("A", fm, xlevels=list(d = mean(dat$D)))$upper > > > > [,1] > > > > 101 0.2308802 > > > > 102 0.2669055 > > > > > > > > I would be interested in any comments on these different > > > approaches > > > > and on the difference in intervals returned by predict() > > > and effect(). > > > > > > AFAIR, the effects packages uses normal-based confidence > intervals > > > and predict.lm uses t-based ones, and I have suggested to > John Fox > > > that t-based intervals would be preferable, at least as an option. > > > > > > > > > -- > > > Brian D. Ripley, [EMAIL PROTECTED] > > > Professor of Applied Statistics, > http://www.stats.ox.ac.uk/~ripley/ > > > University of Oxford, Tel: +44 1865 272861 (self) > > > 1 South Parks Road, +44 1865 272866 (PA) > > > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > > > > > ______________________________________________ > > > R-help@stat.math.ethz.ch mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > > guide.html and provide commented, minimal, self-contained, > > reproducible code. > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.