Re: [R] How to get LSMEANS from linear mixed model?
Yannan Jiang wrote: Hi there, I am trying to run simulations using R with linear mixed model (lme). There are two factors in my fixed effect model, educ (treatment and control) and mth (visit 1, 2, and 3). What I want to obtain is the estimated treatment difference (treatment - control) at visit 3, plus the standard error and p-value. This can be easily obtained in SAS using lsmeans or estimate statements, but I am not sure how to do this in R. The fixed effects I obtained are as follows: Fixed effects: ymth ~ educ * mth - 1 Value Std.Error DFt-value p-value educcont 0.14814308 0.006232419 93 23.769758 0. eductreat 0.13696952 0.006255672 93 21.895254 0. mthymth2 0.3759 0.006333043 165 0.005936 0.9953 mthymth3 0.01075489 0.006251328 165 1.720416 0.0872 eductreat:mthymth2 0.00323847 0.008947291 165 0.361950 0.7179 eductreat:mthymth3 -0.01246565 0.008941306 165 -1.394164 0.1651 The estimated treatment difference I am interested are: a-0.14814308+0.01075489 b-0.13696952+0.01075489-0.01246565 b-a [1] -0.02363921 (treatment effect at visit 3, same as SAS lsmean output) But I don't know how to get the standard error and corresponding p-value for this estimate. Any of your helps on that would be greatly appreciated! How about fitting the model this way? df$mth - relevel(df$mth, ref = ymth3) lme(ymth ~ educ * mth, random = ~ 1 | id, data = df) The coefficient for educ will contain the simple effect of educ at mth=ymth3, along with a standard error and p-value. Thanks, Yannan [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get lsmeans?
The Exegesis paper gave me a great look at the history of all this. I had not been aware that S-PLUS had gone that route. There is much to be said for knowing you might be more successful but sticking to your perspective instead. And in the long run, that may be the more successful route anyway. Thanks, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: Liaw, Andy [mailto:[EMAIL PROTECTED] Sent: Thursday, March 22, 2007 5:27 PM To: Douglas Bates; Muenchen, Robert A (Bob) Cc: R-help@stat.math.ethz.ch Subject: RE: [R] how to get lsmeans? From: Douglas Bates On 3/22/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Perhaps I'm stating the obvious, but to increase the use of R in places where SAS SPSS dominate, it's important to make getting the same answers as easy as possible. That includes things like lsmeans and type III sums of squares. I've read lots of discussions here on sums of squares I'm not advocating type III use, just looking at it from a marketing perspective. Too many people look for excuses to not change. The fewer excuses, the better. You may get strong reactions to such a suggestion. I recommend reading Bill Venables' famous unpublished paper Exegeses on linear models (google for the title - very few people use Exegeses and linear models in the same sentence - in fact I would not be surprised if Bill was the only one who has ever done so). It's on the MASS page: http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf I believe it's based on a talk Bill gave at a S-PLUS User's Conference. I think it deserves to be required reading for all graduate level linear models course. You must realize that R is written by experts in statistics and statistical computing who, despite popular opinion, do not believe that everything in SAS and SPSS is worth copying. Some things done in such packages, which trace their roots back to the days of punched cards and magnetic tape when fitting a single linear model may take several days because your first 5 attempts failed due to syntax errors in the JCL or the SAS code, still reflect the approach of give me every possible statistic that could be calculated from this model, whether or not it makes sense. The approach taken in R is different. The underlying assumption is that the useR is thinking about the analysis while doing it. The fact that it is so difficult to explain what lsmeans are and why they would be of interest is an indication of why they aren't implemented in any of the required packages. Perhaps I should have made it clear in my original post: I gave the example and code more to show what the mysterious least squares means are (which John explained lucidly), than how to replicate what SAS (or JMP) outputs. I do not understand how people can feel comfortable reporting things like lsmeans and p-values from type insert your favorite Roman numeral here tests when they do not know how such things arise or, at the very least, what they _really_ mean. (Given how simple lsmeans are computed, not knowing how to compute them is pretty much the same as not knowing what they are.) One of the dangers of wholesale output as SAS or SPSS gives is for the user to simply pick an answer and run with it, without understanding what that answer is, or if it corresponds to the question of interest. As to whether to weight the levels of the factors being held constant, my suggestion to John would be to offer both choices (unweighted and weighted by observed frequencies). I can see why one would want to weight by observed frequencies (if the data are sampled from a population), but there are certainly situations (perhaps more often than not in the cases I've encountered) that the observed frequencies do not come close to approximating what they are in the population. In such cases the unweighted average would make more sense to me. Cheers, Andy -Original Message- From: [EMAIL PROTECTED] [mailto:r-help- [EMAIL PROTECTED] On Behalf Of John Fox Sent: Wednesday, March 21, 2007 8:59 PM To: 'Prof Brian Ripley' Cc: 'r-help'; 'Chuck Cleland' Subject: Re: [R] how to get lsmeans? Dear Brian et al., My apologies for chiming in late: It's been a busy day. First some general comments on least-squares means and effect displays. The general idea behind the two is similar -- to examine fitted values corresponding to a term in a model while holding
Re: [R] how to get lsmeans?
Dear Brian et al., My apologies for chiming in late: It's been a busy day. First some general comments on least-squares means and effect displays. The general idea behind the two is similar -- to examine fitted values corresponding to a term in a model while holding other terms to typical values -- but the implementation is not identical. There are also other similar ideas floating around as well. My formulation is more general in the sense that it applies to a wider variety of models, both linear and otherwise. Least-squares means (a horrible term, by the way: in a 1980 paper in the American Statistician, Searle, Speed, and Milliken suggested the more descriptive term population marginal means) apply to factors and combinations of factors; covariates are set to mean values and the levels of other factors are averaged over, in effect applying equal weight to each level. (This is from memory, so it's possible that I'm not getting it quite right, but I believe that I am.) In my effect displays, each level of a factor is weighted by its proportion in the data. In models in which least-squares means can be computed, they should differ from the corresponding effect display by a constant (if there are different numbers of observations in the different levels of the factors that are held constant). The obstacle to computing either least-squares means or effect displays in R via predict() is that predict() wants factors in the new data to be set to particular levels. The effect() function in the effects package bypasses predict() and works directly with the model matrix, averaging over the columns that pertain to a factor (and reconstructing interactions as necessary). As mentioned, this has the effect of setting the factor to its proportional distribution in the data. This approach also has the advantage of being invariant with respect to the choice of contrasts for a factor. The only convenient way that I can think of to implement least-squares means in R would be to use deviation-coded regressors for a factor (that is, contr.sum) and then to set the columns of the model matrix for the factor(s) to be averaged over to 0. It may just be that I'm having a failure of imagination and that there's a better way to proceed. I've not implemented this solution because it is dependent upon the choice of contrasts and because I don't see a general advantage to it, but since the issue has come up several times now, maybe I should take a crack at it. Remember that I want this to work more generally, not just for levels of factors, and not just for linear models. Brian is quite right in mentioning that he suggested some time ago that I use critical values of t rather than of the standard normal distribution for producing confidence intervals, and I agree that it makes sense to do so in models in which the dispersion is estimated. My only excuse for not yet doing this is that I want to undertake a more general revision of the effects package, and haven't had time to do it. There are several changes that I'd like to make to the package. For example, I have results for multinomial and proportional odds logit models (described in a paper by me and Bob Andersen in the 2006 issue of Sociological Methodology) that I want to incorporate, and I'd like to improve the appearance of the default graphs. But Brian's suggestion is very straightforward, and I guess that I shouldn't wait to implement it; I'll do so very soon. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Prof Brian Ripley Sent: Wednesday, March 21, 2007 12:03 PM To: Chuck Cleland Cc: r-help Subject: Re: [R] how to get lsmeans? On Wed, 21 Mar 2007, Chuck Cleland wrote: Liaw, Andy wrote: I verified the result from the following with output from JMP 6 on the same data (don't have SAS: don't need it): set.seed(631) n - 100 dat - data.frame(y=rnorm(n), A=factor(sample(1:2, n, replace=TRUE)), B=factor(sample(1:2, n, replace=TRUE)), C=factor(sample(1:2, n, replace=TRUE)), d=rnorm(n)) fm - lm(y ~ A + B + C + d, dat) ## Form a data frame of points to predict: all combinations of the ## three factors and the mean of the covariate. p - data.frame(expand.grid(A=1:2, B=1:2, C=1:2)) p[] - lapply(p, factor) p - cbind(p, d=mean(dat$d)) p - cbind(yhat=predict(fm, p), p) ## lsmeans for the three factors: with(p, tapply(yhat, A, mean)) with(p, tapply(yhat, B, mean)) with(p, tapply(yhat, C, mean)) Using Andy's example data, these are the LSMEANS and intervals I get from SAS: Ay LSMEAN 95% Confidence Limits 1 -0.071847 -0.387507 0.243813 2 -0.029621
Re: [R] how to get lsmeans?
Hi All, Perhaps I'm stating the obvious, but to increase the use of R in places where SAS SPSS dominate, it's important to make getting the same answers as easy as possible. That includes things like lsmeans and type III sums of squares. I've read lots of discussions here on sums of squares I'm not advocating type III use, just looking at it from a marketing perspective. Too many people look for excuses to not change. The fewer excuses, the better. Of course this is easy for me to say, as I'm not the one who does the work! Much thanks to those who do. Cheers, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: [EMAIL PROTECTED] [mailto:r-help- [EMAIL PROTECTED] On Behalf Of John Fox Sent: Wednesday, March 21, 2007 8:59 PM To: 'Prof Brian Ripley' Cc: 'r-help'; 'Chuck Cleland' Subject: Re: [R] how to get lsmeans? Dear Brian et al., My apologies for chiming in late: It's been a busy day. First some general comments on least-squares means and effect displays. The general idea behind the two is similar -- to examine fitted values corresponding to a term in a model while holding other terms to typical values -- but the implementation is not identical. There are also other similar ideas floating around as well. My formulation is more general in the sense that it applies to a wider variety of models, both linear and otherwise. Least-squares means (a horrible term, by the way: in a 1980 paper in the American Statistician, Searle, Speed, and Milliken suggested the more descriptive term population marginal means) apply to factors and combinations of factors; covariates are set to mean values and the levels of other factors are averaged over, in effect applying equal weight to each level. (This is from memory, so it's possible that I'm not getting it quite right, but I believe that I am.) In my effect displays, each level of a factor is weighted by its proportion in the data. In models in which least-squares means can be computed, they should differ from the corresponding effect display by a constant (if there are different numbers of observations in the different levels of the factors that are held constant). The obstacle to computing either least-squares means or effect displays in R via predict() is that predict() wants factors in the new data to be set to particular levels. The effect() function in the effects package bypasses predict() and works directly with the model matrix, averaging over the columns that pertain to a factor (and reconstructing interactions as necessary). As mentioned, this has the effect of setting the factor to its proportional distribution in the data. This approach also has the advantage of being invariant with respect to the choice of contrasts for a factor. The only convenient way that I can think of to implement least-squares means in R would be to use deviation-coded regressors for a factor (that is, contr.sum) and then to set the columns of the model matrix for the factor(s) to be averaged over to 0. It may just be that I'm having a failure of imagination and that there's a better way to proceed. I've not implemented this solution because it is dependent upon the choice of contrasts and because I don't see a general advantage to it, but since the issue has come up several times now, maybe I should take a crack at it. Remember that I want this to work more generally, not just for levels of factors, and not just for linear models. Brian is quite right in mentioning that he suggested some time ago that I use critical values of t rather than of the standard normal distribution for producing confidence intervals, and I agree that it makes sense to do so in models in which the dispersion is estimated. My only excuse for not yet doing this is that I want to undertake a more general revision of the effects package, and haven't had time to do it. There are several changes that I'd like to make to the package. For example, I have results for multinomial and proportional odds logit models (described in a paper by me and Bob Andersen in the 2006 issue of Sociological Methodology) that I want to incorporate, and I'd like to improve the appearance of the default graphs. But Brian's suggestion is very straightforward, and I guess that I shouldn't wait to implement it; I'll do so very soon. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604
Re: [R] how to get lsmeans?
I believe there is an easier way and you don't have to do a thing. A student of mine just purchased a brand new laptop loaded with Vista. Guess what? SPSS v14 or v15 does not work with vista. In a couple of days we will install ubuntu on it and R. Enjoy, Ioannis On Πεμ, Μάρτιος 22, 2007 15:35, Muenchen, Robert A (Bob) wrote: Hi All, Perhaps I'm stating the obvious, but to increase the use of R in places where SAS SPSS dominate, it's important to make getting the same answers as easy as possible. That includes things like lsmeans and type III sums of squares. I've read lots of discussions here on sums of squares I'm not advocating type III use, just looking at it from a marketing perspective. Too many people look for excuses to not change. The fewer excuses, the better. Of course this is easy for me to say, as I'm not the one who does the work! Much thanks to those who do. Cheers, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: [EMAIL PROTECTED] [mailto:r-help- [EMAIL PROTECTED] On Behalf Of John Fox Sent: Wednesday, March 21, 2007 8:59 PM To: 'Prof Brian Ripley' Cc: 'r-help'; 'Chuck Cleland' Subject: Re: [R] how to get lsmeans? Dear Brian et al., My apologies for chiming in late: It's been a busy day. First some general comments on least-squares means and effect displays. The general idea behind the two is similar -- to examine fitted values corresponding to a term in a model while holding other terms to typical values -- but the implementation is not identical. There are also other similar ideas floating around as well. My formulation is more general in the sense that it applies to a wider variety of models, both linear and otherwise. Least-squares means (a horrible term, by the way: in a 1980 paper in the American Statistician, Searle, Speed, and Milliken suggested the more descriptive term population marginal means) apply to factors and combinations of factors; covariates are set to mean values and the levels of other factors are averaged over, in effect applying equal weight to each level. (This is from memory, so it's possible that I'm not getting it quite right, but I believe that I am.) In my effect displays, each level of a factor is weighted by its proportion in the data. In models in which least-squares means can be computed, they should differ from the corresponding effect display by a constant (if there are different numbers of observations in the different levels of the factors that are held constant). The obstacle to computing either least-squares means or effect displays in R via predict() is that predict() wants factors in the new data to be set to particular levels. The effect() function in the effects package bypasses predict() and works directly with the model matrix, averaging over the columns that pertain to a factor (and reconstructing interactions as necessary). As mentioned, this has the effect of setting the factor to its proportional distribution in the data. This approach also has the advantage of being invariant with respect to the choice of contrasts for a factor. The only convenient way that I can think of to implement least-squares means in R would be to use deviation-coded regressors for a factor (that is, contr.sum) and then to set the columns of the model matrix for the factor(s) to be averaged over to 0. It may just be that I'm having a failure of imagination and that there's a better way to proceed. I've not implemented this solution because it is dependent upon the choice of contrasts and because I don't see a general advantage to it, but since the issue has come up several times now, maybe I should take a crack at it. Remember that I want this to work more generally, not just for levels of factors, and not just for linear models. Brian is quite right in mentioning that he suggested some time ago that I use critical values of t rather than of the standard normal distribution for producing confidence intervals, and I agree that it makes sense to do so in models in which the dispersion is estimated. My only excuse for not yet doing this is that I want to undertake a more general revision of the effects package, and haven't had time to do it. There are several changes that I'd like to make to the package. For example, I have results for multinomial and proportional odds logit models (described in a paper by me and Bob Andersen in the 2006 issue of Sociological Methodology) that I want
Re: [R] how to get lsmeans?
Dear Bob, I think that you make a valid point -- and I've included Type-III tests in the Anova() function in the car package, for example, though not entirely for consistency with SAS -- but my object in writing the effects package was different. I wanted to provide a general approach to visualizing terms in complex models with linear predictors. If I can with reasonable effort provide a solution that's the same as least-squares means for models for which least-squares means are defined then I'll do so at some point, but duplicating what SAS does was not my goal. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Muenchen, Robert A (Bob) Sent: Thursday, March 22, 2007 9:36 AM To: R-help@stat.math.ethz.ch Subject: Re: [R] how to get lsmeans? Hi All, Perhaps I'm stating the obvious, but to increase the use of R in places where SAS SPSS dominate, it's important to make getting the same answers as easy as possible. That includes things like lsmeans and type III sums of squares. I've read lots of discussions here on sums of squares I'm not advocating type III use, just looking at it from a marketing perspective. Too many people look for excuses to not change. The fewer excuses, the better. Of course this is easy for me to say, as I'm not the one who does the work! Much thanks to those who do. Cheers, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: [EMAIL PROTECTED] [mailto:r-help- [EMAIL PROTECTED] On Behalf Of John Fox Sent: Wednesday, March 21, 2007 8:59 PM To: 'Prof Brian Ripley' Cc: 'r-help'; 'Chuck Cleland' Subject: Re: [R] how to get lsmeans? Dear Brian et al., My apologies for chiming in late: It's been a busy day. First some general comments on least-squares means and effect displays. The general idea behind the two is similar -- to examine fitted values corresponding to a term in a model while holding other terms to typical values -- but the implementation is not identical. There are also other similar ideas floating around as well. My formulation is more general in the sense that it applies to a wider variety of models, both linear and otherwise. Least-squares means (a horrible term, by the way: in a 1980 paper in the American Statistician, Searle, Speed, and Milliken suggested the more descriptive term population marginal means) apply to factors and combinations of factors; covariates are set to mean values and the levels of other factors are averaged over, in effect applying equal weight to each level. (This is from memory, so it's possible that I'm not getting it quite right, but I believe that I am.) In my effect displays, each level of a factor is weighted by its proportion in the data. In models in which least-squares means can be computed, they should differ from the corresponding effect display by a constant (if there are different numbers of observations in the different levels of the factors that are held constant). The obstacle to computing either least-squares means or effect displays in R via predict() is that predict() wants factors in the new data to be set to particular levels. The effect() function in the effects package bypasses predict() and works directly with the model matrix, averaging over the columns that pertain to a factor (and reconstructing interactions as necessary). As mentioned, this has the effect of setting the factor to its proportional distribution in the data. This approach also has the advantage of being invariant with respect to the choice of contrasts for a factor. The only convenient way that I can think of to implement least-squares means in R would be to use deviation-coded regressors for a factor (that is, contr.sum) and then to set the columns of the model matrix for the factor(s) to be averaged over to 0. It may just be that I'm having a failure of imagination and that there's a better way to proceed. I've not implemented this solution because it is dependent upon the choice of contrasts and because I don't see a general advantage to it, but since the issue has come up several times now, maybe I should take
Re: [R] how to get lsmeans?
Dear John, As a green hand of R, I feel that if the results are identical to the ones of the user's familiar softwares such as SAS, SPSS or Stata etc,the user will feel confident about the results of R. Why? Because lots of R beginners have no strong statistical background, just like me, they chose to trust SAS or SPSS firstly. The best solution is that R should not only have the capability to produce the identical results of SAS or SPSS, but also should have many more powerful, more prefessional functions(packages). Then more and more guys who are not special at statistics will enjoy R, which makes R popular; and more and more statisticians like it also, which makes R prefessional. Obviously, R is on the load to success. Best wishes. yours, sincerely, Xingwang Ye PhD candidate Research Group of Nutrition Related Cancers and Other Chronic Diseases Institute for Nutritional Sciences, Shanghai Institutes of Biological Sciences, Chinese Academy of Sciences Dear Bob, I think that you make a valid point -- and I've included Type-III tests in the Anova() function in the car package, for example, though not entirely for consistency with SAS -- but my object in writing the effects package was different. I wanted to provide a general approach to visualizing terms in complex models with linear predictors. If I can with reasonable effort provide a solution that's the same as least-squares means for models for which least-squares means are defined then I'll do so at some point, but duplicating what SAS does was not my goal. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Muenchen, Robert A (Bob) Sent: Thursday, March 22, 2007 9:36 AM To: R-help@stat.math.ethz.ch Subject: Re: [R] how to get lsmeans? Hi All, Perhaps I'm stating the obvious, but to increase the use of R in places where SAS SPSS dominate, it's important to make getting the same answers as easy as possible. That includes things like lsmeans and type III sums of squares. I've read lots of discussions here on sums of squares I'm not advocating type III use, just looking at it from a marketing perspective. Too many people look for excuses to not change. The fewer excuses, the better. Of course this is easy for me to say, as I'm not the one who does the work! Much thanks to those who do. Cheers, Bob = Bob Muenchen (pronounced Min'-chen), Manager Statistical Consulting Center U of TN Office of Information Technology 200 Stokely Management Center, Knoxville, TN 37996-0520 Voice: (865) 974-5230 FAX: (865) 974-4810 Email: [EMAIL PROTECTED] Web: http://oit.utk.edu/scc, News: http://listserv.utk.edu/archives/statnews.html = -Original Message- From: [EMAIL PROTECTED] [mailto:r-help- [EMAIL PROTECTED] On Behalf Of John Fox Sent: Wednesday, March 21, 2007 8:59 PM To: 'Prof Brian Ripley' Cc: 'r-help'; 'Chuck Cleland' Subject: Re: [R] how to get lsmeans? Dear Brian et al., My apologies for chiming in late: It's been a busy day. First some general comments on least-squares means and effect displays. The general idea behind the two is similar -- to examine fitted values corresponding to a term in a model while holding other terms to typical values -- but the implementation is not identical. There are also other similar ideas floating around as well. My formulation is more general in the sense that it applies to a wider variety of models, both linear and otherwise. Least-squares means (a horrible term, by the way: in a 1980 paper in the American Statistician, Searle, Speed, and Milliken suggested the more descriptive term population marginal means) apply to factors and combinations of factors; covariates are set to mean values and the levels of other factors are averaged over, in effect applying equal weight to each level. (This is from memory, so it's possible that I'm not getting it quite right, but I believe that I am.) In my effect displays, each level of a factor is weighted by its proportion in the data. In models in which least-squares means can be computed, they should differ from the corresponding effect display by a constant (if there are different numbers of observations in the different levels of the factors that are held constant). The obstacle to computing either least-squares means or effect displays in R via predict
Re: [R] how to get lsmeans?
Dear Frank, -Original Message- From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED] Sent: Thursday, March 22, 2007 10:51 AM To: John Fox Cc: 'Prof Brian Ripley'; 'r-help'; 'Chuck Cleland'; 'Liaw, Andy' Subject: Re: [R] how to get lsmeans? John Fox wrote: Dear Frank, The object is to focus on a high-order term of the model, holding other terms constant at typical values; in the case of a factor, a typical value is ambiguous, so an average is taken over the levels of the factor. If the factor is, e.g., gender, one could produce an estimate of the average fitted value for a population composed equally of men and women, or of a population composed of men and women in proportion to their distribution in the data. Otherwise, one would have to produce separate sets of fitted values for men and women, with the number of such sets increasing as the number of levels of the factors held constant increase. On the scale of the linear predictor, these sets would differ only by constants. Regards, John Makes sense. I just set other factors to the mode, and if it is important to see estimates for other categories, I give estimates for each factor level. If I want to uncondition on a variable (not often) I take the proper weighted average of predicted values. The mode seens arbitrary to me, but I don't think that there's a unique right answer here. The weighted average (using sample proportions) is what the effect() function does. Regards, John Cheers Frank John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED] Sent: Thursday, March 22, 2007 8:41 AM To: John Fox Cc: 'Prof Brian Ripley'; 'r-help'; 'Chuck Cleland'; 'Liaw, Andy' Subject: Re: [R] how to get lsmeans? John Fox wrote: Dear Brian et al., My apologies for chiming in late: It's been a busy day. First some general comments on least-squares means and effect displays. ... much good stuff deleted ... John - the one thing I didn't get from your post is a motivation to do all that as opposed to easy-to-explain predicted values. I would appreciate your thoughts. Thanks Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get lsmeans?
Dear Frank, The object is to focus on a high-order term of the model, holding other terms constant at typical values; in the case of a factor, a typical value is ambiguous, so an average is taken over the levels of the factor. If the factor is, e.g., gender, one could produce an estimate of the average fitted value for a population composed equally of men and women, or of a population composed of men and women in proportion to their distribution in the data. Otherwise, one would have to produce separate sets of fitted values for men and women, with the number of such sets increasing as the number of levels of the factors held constant increase. On the scale of the linear predictor, these sets would differ only by constants. Regards, John John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED] Sent: Thursday, March 22, 2007 8:41 AM To: John Fox Cc: 'Prof Brian Ripley'; 'r-help'; 'Chuck Cleland'; 'Liaw, Andy' Subject: Re: [R] how to get lsmeans? John Fox wrote: Dear Brian et al., My apologies for chiming in late: It's been a busy day. First some general comments on least-squares means and effect displays. ... much good stuff deleted ... John - the one thing I didn't get from your post is a motivation to do all that as opposed to easy-to-explain predicted values. I would appreciate your thoughts. Thanks Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get lsmeans?
John Fox wrote: Dear Brian et al., My apologies for chiming in late: It's been a busy day. First some general comments on least-squares means and effect displays. ... much good stuff deleted ... John - the one thing I didn't get from your post is a motivation to do all that as opposed to easy-to-explain predicted values. I would appreciate your thoughts. Thanks Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get lsmeans?
John Fox wrote: Dear Frank, The object is to focus on a high-order term of the model, holding other terms constant at typical values; in the case of a factor, a typical value is ambiguous, so an average is taken over the levels of the factor. If the factor is, e.g., gender, one could produce an estimate of the average fitted value for a population composed equally of men and women, or of a population composed of men and women in proportion to their distribution in the data. Otherwise, one would have to produce separate sets of fitted values for men and women, with the number of such sets increasing as the number of levels of the factors held constant increase. On the scale of the linear predictor, these sets would differ only by constants. Regards, John Makes sense. I just set other factors to the mode, and if it is important to see estimates for other categories, I give estimates for each factor level. If I want to uncondition on a variable (not often) I take the proper weighted average of predicted values. Cheers Frank John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox -Original Message- From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED] Sent: Thursday, March 22, 2007 8:41 AM To: John Fox Cc: 'Prof Brian Ripley'; 'r-help'; 'Chuck Cleland'; 'Liaw, Andy' Subject: Re: [R] how to get lsmeans? John Fox wrote: Dear Brian et al., My apologies for chiming in late: It's been a busy day. First some general comments on least-squares means and effect displays. ... much good stuff deleted ... John - the one thing I didn't get from your post is a motivation to do all that as opposed to easy-to-explain predicted values. I would appreciate your thoughts. Thanks Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get lsmeans?
On 3/22/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Perhaps I'm stating the obvious, but to increase the use of R in places where SAS SPSS dominate, it's important to make getting the same answers as easy as possible. That includes things like lsmeans and type III sums of squares. I've read lots of discussions here on sums of squares I'm not advocating type III use, just looking at it from a marketing perspective. Too many people look for excuses to not change. The fewer excuses, the better. You may get strong reactions to such a suggestion. I recommend reading Bill Venables' famous unpublished paper Exegeses on linear models (google for the title - very few people use Exegeses and linear models in the same sentence - in fact I would not be surprised if Bill was the only one who has ever done so). You must realize that R is written by experts in statistics and statistical computing who, despite popular opinion, do not believe that everything in SAS and SPSS is worth copying. Some things done in such packages, which trace their roots back to the days of punched cards and magnetic tape when fitting a single linear model may take several days because your first 5 attempts failed due to syntax errors in the JCL or the SAS code, still reflect the approach of give me every possible statistic that could be calculated from this model, whether or not it makes sense. The approach taken in R is different. The underlying assumption is that the useR is thinking about the analysis while doing it. The fact that it is so difficult to explain what lsmeans are and why they would be of interest is an indication of why they aren't implemented in any of the required packages. -Original Message- From: [EMAIL PROTECTED] [mailto:r-help- [EMAIL PROTECTED] On Behalf Of John Fox Sent: Wednesday, March 21, 2007 8:59 PM To: 'Prof Brian Ripley' Cc: 'r-help'; 'Chuck Cleland' Subject: Re: [R] how to get lsmeans? Dear Brian et al., My apologies for chiming in late: It's been a busy day. First some general comments on least-squares means and effect displays. The general idea behind the two is similar -- to examine fitted values corresponding to a term in a model while holding other terms to typical values -- but the implementation is not identical. There are also other similar ideas floating around as well. My formulation is more general in the sense that it applies to a wider variety of models, both linear and otherwise. Least-squares means (a horrible term, by the way: in a 1980 paper in the American Statistician, Searle, Speed, and Milliken suggested the more descriptive term population marginal means) apply to factors and combinations of factors; covariates are set to mean values and the levels of other factors are averaged over, in effect applying equal weight to each level. (This is from memory, so it's possible that I'm not getting it quite right, but I believe that I am.) In my effect displays, each level of a factor is weighted by its proportion in the data. In models in which least-squares means can be computed, they should differ from the corresponding effect display by a constant (if there are different numbers of observations in the different levels of the factors that are held constant). The obstacle to computing either least-squares means or effect displays in R via predict() is that predict() wants factors in the new data to be set to particular levels. The effect() function in the effects package bypasses predict() and works directly with the model matrix, averaging over the columns that pertain to a factor (and reconstructing interactions as necessary). As mentioned, this has the effect of setting the factor to its proportional distribution in the data. This approach also has the advantage of being invariant with respect to the choice of contrasts for a factor. The only convenient way that I can think of to implement least-squares means in R would be to use deviation-coded regressors for a factor (that is, contr.sum) and then to set the columns of the model matrix for the factor(s) to be averaged over to 0. It may just be that I'm having a failure of imagination and that there's a better way to proceed. I've not implemented this solution because it is dependent upon the choice of contrasts and because I don't see a general advantage to it, but since the issue has come up several times now, maybe I should take a crack at it. Remember that I want this to work more generally, not just for levels of factors, and not just for linear models. Brian is quite right in mentioning that he suggested some time ago that I use critical values of t rather than of the standard normal distribution for producing confidence intervals, and I agree that it makes sense to do so in models in which the dispersion is estimated. My only excuse
Re: [R] how to get lsmeans?
From: Douglas Bates On 3/22/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote: Perhaps I'm stating the obvious, but to increase the use of R in places where SAS SPSS dominate, it's important to make getting the same answers as easy as possible. That includes things like lsmeans and type III sums of squares. I've read lots of discussions here on sums of squares I'm not advocating type III use, just looking at it from a marketing perspective. Too many people look for excuses to not change. The fewer excuses, the better. You may get strong reactions to such a suggestion. I recommend reading Bill Venables' famous unpublished paper Exegeses on linear models (google for the title - very few people use Exegeses and linear models in the same sentence - in fact I would not be surprised if Bill was the only one who has ever done so). It's on the MASS page: http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf I believe it's based on a talk Bill gave at a S-PLUS User's Conference. I think it deserves to be required reading for all graduate level linear models course. You must realize that R is written by experts in statistics and statistical computing who, despite popular opinion, do not believe that everything in SAS and SPSS is worth copying. Some things done in such packages, which trace their roots back to the days of punched cards and magnetic tape when fitting a single linear model may take several days because your first 5 attempts failed due to syntax errors in the JCL or the SAS code, still reflect the approach of give me every possible statistic that could be calculated from this model, whether or not it makes sense. The approach taken in R is different. The underlying assumption is that the useR is thinking about the analysis while doing it. The fact that it is so difficult to explain what lsmeans are and why they would be of interest is an indication of why they aren't implemented in any of the required packages. Perhaps I should have made it clear in my original post: I gave the example and code more to show what the mysterious least squares means are (which John explained lucidly), than how to replicate what SAS (or JMP) outputs. I do not understand how people can feel comfortable reporting things like lsmeans and p-values from type insert your favorite Roman numeral here tests when they do not know how such things arise or, at the very least, what they _really_ mean. (Given how simple lsmeans are computed, not knowing how to compute them is pretty much the same as not knowing what they are.) One of the dangers of wholesale output as SAS or SPSS gives is for the user to simply pick an answer and run with it, without understanding what that answer is, or if it corresponds to the question of interest. As to whether to weight the levels of the factors being held constant, my suggestion to John would be to offer both choices (unweighted and weighted by observed frequencies). I can see why one would want to weight by observed frequencies (if the data are sampled from a population), but there are certainly situations (perhaps more often than not in the cases I've encountered) that the observed frequencies do not come close to approximating what they are in the population. In such cases the unweighted average would make more sense to me. Cheers, Andy -Original Message- From: [EMAIL PROTECTED] [mailto:r-help- [EMAIL PROTECTED] On Behalf Of John Fox Sent: Wednesday, March 21, 2007 8:59 PM To: 'Prof Brian Ripley' Cc: 'r-help'; 'Chuck Cleland' Subject: Re: [R] how to get lsmeans? Dear Brian et al., My apologies for chiming in late: It's been a busy day. First some general comments on least-squares means and effect displays. The general idea behind the two is similar -- to examine fitted values corresponding to a term in a model while holding other terms to typical values -- but the implementation is not identical. There are also other similar ideas floating around as well. My formulation is more general in the sense that it applies to a wider variety of models, both linear and otherwise. Least-squares means (a horrible term, by the way: in a 1980 paper in the American Statistician, Searle, Speed, and Milliken suggested the more descriptive term population marginal means) apply to factors and combinations of factors; covariates are set to mean values and the levels of other factors are averaged over, in effect applying equal weight to each level. (This is from memory, so it's possible that I'm not getting it quite right, but I believe that I am.) In my effect displays, each level of a factor is weighted by its proportion in the data. In models in which least-squares means can be computed, they should differ from the corresponding effect display by a constant (if there are different
Re: [R] how to get lsmeans?
I verified the result from the following with output from JMP 6 on the same data (don't have SAS: don't need it): set.seed(631) n - 100 dat - data.frame(y=rnorm(n), A=factor(sample(1:2, n, replace=TRUE)), B=factor(sample(1:2, n, replace=TRUE)), C=factor(sample(1:2, n, replace=TRUE)), d=rnorm(n)) fm - lm(y ~ A + B + C + d, dat) ## Form a data frame of points to predict: all combinations of the ## three factors and the mean of the covariate. p - data.frame(expand.grid(A=1:2, B=1:2, C=1:2)) p[] - lapply(p, factor) p - cbind(p, d=mean(dat$d)) p - cbind(yhat=predict(fm, p), p) ## lsmeans for the three factors: with(p, tapply(yhat, A, mean)) with(p, tapply(yhat, B, mean)) with(p, tapply(yhat, C, mean)) Andy From: Xingwang Ye Dear all, I search the mail list about this topic and learn that no simple way is available to get lsmeans in R as in SAS. Dr.John Fox and Dr.Frank E Harrell have given very useful information about lsmeans topic. Dr. Frank E Harrell suggests not to think about lsmeans, just to think about what predicted values wanted and to use the predict function. However, after reading the R help file for a whole day, I am still unclear how to do it. Could some one give me a hand? for example: A,B and C are binomial variables(factors); d is a continuous variable ; The response variable Y is a continuous variable too. To get lsmeans of Y according to A,B and C, respectively, in SAS, I tried proc glm data=a; class A B C; model Y=A B C d; lsmeans A B C/cl; run; In R, I tried this: library(Design) ddist-datadist(a) options(datadist=ddist) f-ols(Y~A+B+C+D,data=a,x=TRUE,y=TRUE,se.fit=TRUE) then how to get the lsmeans for A, B, and C, respectively with predict function? Best wishes yours, sincerely Xingwang Ye PhD candidate Research Group of Nutrition Related Cancers and Other Chronic Diseases Institute for Nutritional Sciences, Shanghai Institutes of Biological Sciences, Chinese Academy of Sciences P.O.Box 32 294 Taiyuan Road Shanghai 200031 P.R.CHINA -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get lsmeans?
Liaw, Andy wrote: I verified the result from the following with output from JMP 6 on the same data (don't have SAS: don't need it): set.seed(631) n - 100 dat - data.frame(y=rnorm(n), A=factor(sample(1:2, n, replace=TRUE)), B=factor(sample(1:2, n, replace=TRUE)), C=factor(sample(1:2, n, replace=TRUE)), d=rnorm(n)) fm - lm(y ~ A + B + C + d, dat) ## Form a data frame of points to predict: all combinations of the ## three factors and the mean of the covariate. p - data.frame(expand.grid(A=1:2, B=1:2, C=1:2)) p[] - lapply(p, factor) p - cbind(p, d=mean(dat$d)) p - cbind(yhat=predict(fm, p), p) ## lsmeans for the three factors: with(p, tapply(yhat, A, mean)) with(p, tapply(yhat, B, mean)) with(p, tapply(yhat, C, mean)) Using Andy's example data, these are the LSMEANS and intervals I get from SAS: Ay LSMEAN 95% Confidence Limits 1 -0.071847 -0.387507 0.243813 2 -0.029621 -0.342358 0.283117 By LSMEAN 95% Confidence Limits 1 -0.104859 -0.397935 0.188216 20.003391 -0.333476 0.340258 Cy LSMEAN 95% Confidence Limits 1 -0.084679 -0.392343 0.222986 2 -0.016789 -0.336374 0.302795 One way of reproducing the LSMEANS and intervals from SAS using predict() seems to be the following: dat.lm - lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat) newdat - expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d)) cbind(newdat, predict(dat.lm, newdat, interval=confidence)) A B C d fitlwr upr 1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128 2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165 However, another possibility seems to be: dat.lm - lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat) newdat - expand.grid(A=factor(c(1,2)),B=mean(as.numeric(dat$B)),C=mean(as.numeric(dat$C)),d=mean(dat$d)) cbind(newdat, predict(dat.lm, newdat, interval=confidence)) ABC d fitlwr upr 1 1 1.43 1.48 0.09838595 -0.08078243 -0.3964661 0.2349012 2 2 1.43 1.48 0.09838595 -0.03855619 -0.3479589 0.2708465 The predictions directly above match what effect() in the effects package by John Fox returns: library(effects) effect(A, fm, xlevels=list(d = mean(dat$D))) A effect A 1 2 -0.08078243 -0.03855619 But for some reason the predict() and effect() intervals are a little different: effect(A, fm, xlevels=list(d = mean(dat$D)))$lower [,1] 101 -0.3924451 102 -0.3440179 effect(A, fm, xlevels=list(d = mean(dat$D)))$upper [,1] 101 0.2308802 102 0.2669055 I would be interested in any comments on these different approaches and on the difference in intervals returned by predict() and effect(). hope this helps, Chuck Andy From: Xingwang Ye Dear all, I search the mail list about this topic and learn that no simple way is available to get lsmeans in R as in SAS. Dr.John Fox and Dr.Frank E Harrell have given very useful information about lsmeans topic. Dr. Frank E Harrell suggests not to think about lsmeans, just to think about what predicted values wanted and to use the predict function. However, after reading the R help file for a whole day, I am still unclear how to do it. Could some one give me a hand? for example: A,B and C are binomial variables(factors); d is a continuous variable ; The response variable Y is a continuous variable too. To get lsmeans of Y according to A,B and C, respectively, in SAS, I tried proc glm data=a; class A B C; model Y=A B C d; lsmeans A B C/cl; run; In R, I tried this: library(Design) ddist-datadist(a) options(datadist=ddist) f-ols(Y~A+B+C+D,data=a,x=TRUE,y=TRUE,se.fit=TRUE) then how to get the lsmeans for A, B, and C, respectively with predict function? Best wishes yours, sincerely Xingwang Ye PhD candidate Research Group of Nutrition Related Cancers and Other Chronic Diseases Institute for Nutritional Sciences, Shanghai Institutes of Biological Sciences, Chinese Academy of Sciences P.O.Box 32 294 Taiyuan Road Shanghai 200031 P.R.CHINA -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 __
Re: [R] how to get lsmeans?
On Wed, 21 Mar 2007, Chuck Cleland wrote: Liaw, Andy wrote: I verified the result from the following with output from JMP 6 on the same data (don't have SAS: don't need it): set.seed(631) n - 100 dat - data.frame(y=rnorm(n), A=factor(sample(1:2, n, replace=TRUE)), B=factor(sample(1:2, n, replace=TRUE)), C=factor(sample(1:2, n, replace=TRUE)), d=rnorm(n)) fm - lm(y ~ A + B + C + d, dat) ## Form a data frame of points to predict: all combinations of the ## three factors and the mean of the covariate. p - data.frame(expand.grid(A=1:2, B=1:2, C=1:2)) p[] - lapply(p, factor) p - cbind(p, d=mean(dat$d)) p - cbind(yhat=predict(fm, p), p) ## lsmeans for the three factors: with(p, tapply(yhat, A, mean)) with(p, tapply(yhat, B, mean)) with(p, tapply(yhat, C, mean)) Using Andy's example data, these are the LSMEANS and intervals I get from SAS: Ay LSMEAN 95% Confidence Limits 1 -0.071847 -0.387507 0.243813 2 -0.029621 -0.342358 0.283117 By LSMEAN 95% Confidence Limits 1 -0.104859 -0.397935 0.188216 20.003391 -0.333476 0.340258 Cy LSMEAN 95% Confidence Limits 1 -0.084679 -0.392343 0.222986 2 -0.016789 -0.336374 0.302795 One way of reproducing the LSMEANS and intervals from SAS using predict() seems to be the following: dat.lm - lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat) newdat - expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d)) cbind(newdat, predict(dat.lm, newdat, interval=confidence)) A B C d fitlwr upr 1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128 2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165 However, another possibility seems to be: dat.lm - lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat) newdat - expand.grid(A=factor(c(1,2)),B=mean(as.numeric(dat$B)),C=mean(as.numeric(dat$C)),d=mean(dat$d)) cbind(newdat, predict(dat.lm, newdat, interval=confidence)) ABC d fitlwr upr 1 1 1.43 1.48 0.09838595 -0.08078243 -0.3964661 0.2349012 2 2 1.43 1.48 0.09838595 -0.03855619 -0.3479589 0.2708465 The predictions directly above match what effect() in the effects package by John Fox returns: library(effects) effect(A, fm, xlevels=list(d = mean(dat$D))) A effect A 1 2 -0.08078243 -0.03855619 But for some reason the predict() and effect() intervals are a little different: effect(A, fm, xlevels=list(d = mean(dat$D)))$lower [,1] 101 -0.3924451 102 -0.3440179 effect(A, fm, xlevels=list(d = mean(dat$D)))$upper [,1] 101 0.2308802 102 0.2669055 I would be interested in any comments on these different approaches and on the difference in intervals returned by predict() and effect(). AFAIR, the effects packages uses normal-based confidence intervals and predict.lm uses t-based ones, and I have suggested to John Fox that t-based intervals would be preferable, at least as an option. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get lsmeans?
Chuck Cleland wrote: Liaw, Andy wrote: I verified the result from the following with output from JMP 6 on the same data (don't have SAS: don't need it): set.seed(631) n - 100 dat - data.frame(y=rnorm(n), A=factor(sample(1:2, n, replace=TRUE)), B=factor(sample(1:2, n, replace=TRUE)), C=factor(sample(1:2, n, replace=TRUE)), d=rnorm(n)) fm - lm(y ~ A + B + C + d, dat) ## Form a data frame of points to predict: all combinations of the ## three factors and the mean of the covariate. p - data.frame(expand.grid(A=1:2, B=1:2, C=1:2)) p[] - lapply(p, factor) p - cbind(p, d=mean(dat$d)) p - cbind(yhat=predict(fm, p), p) ## lsmeans for the three factors: with(p, tapply(yhat, A, mean)) with(p, tapply(yhat, B, mean)) with(p, tapply(yhat, C, mean)) And with the Design package you can do: f - ols(y ~ ...) ds - gendata(A=c('1','2'),B=c('1','2'),C=c('1','2')) predict(f, ds) But this sets the covariate to the median (nothing unreasonable about that, just will not agree with SAS). To set to mean add d=mean(dat$d) in gendata. Frank Using Andy's example data, these are the LSMEANS and intervals I get from SAS: Ay LSMEAN 95% Confidence Limits 1 -0.071847 -0.387507 0.243813 2 -0.029621 -0.342358 0.283117 By LSMEAN 95% Confidence Limits 1 -0.104859 -0.397935 0.188216 20.003391 -0.333476 0.340258 Cy LSMEAN 95% Confidence Limits 1 -0.084679 -0.392343 0.222986 2 -0.016789 -0.336374 0.302795 One way of reproducing the LSMEANS and intervals from SAS using predict() seems to be the following: dat.lm - lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat) newdat - expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d)) cbind(newdat, predict(dat.lm, newdat, interval=confidence)) A B C d fitlwr upr 1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128 2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165 However, another possibility seems to be: dat.lm - lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat) newdat - expand.grid(A=factor(c(1,2)),B=mean(as.numeric(dat$B)),C=mean(as.numeric(dat$C)),d=mean(dat$d)) cbind(newdat, predict(dat.lm, newdat, interval=confidence)) ABC d fitlwr upr 1 1 1.43 1.48 0.09838595 -0.08078243 -0.3964661 0.2349012 2 2 1.43 1.48 0.09838595 -0.03855619 -0.3479589 0.2708465 The predictions directly above match what effect() in the effects package by John Fox returns: library(effects) effect(A, fm, xlevels=list(d = mean(dat$D))) A effect A 1 2 -0.08078243 -0.03855619 But for some reason the predict() and effect() intervals are a little different: effect(A, fm, xlevels=list(d = mean(dat$D)))$lower [,1] 101 -0.3924451 102 -0.3440179 effect(A, fm, xlevels=list(d = mean(dat$D)))$upper [,1] 101 0.2308802 102 0.2669055 I would be interested in any comments on these different approaches and on the difference in intervals returned by predict() and effect(). hope this helps, Chuck Andy From: Xingwang Ye Dear all, I search the mail list about this topic and learn that no simple way is available to get lsmeans in R as in SAS. Dr.John Fox and Dr.Frank E Harrell have given very useful information about lsmeans topic. Dr. Frank E Harrell suggests not to think about lsmeans, just to think about what predicted values wanted and to use the predict function. However, after reading the R help file for a whole day, I am still unclear how to do it. Could some one give me a hand? for example: A,B and C are binomial variables(factors); d is a continuous variable ; The response variable Y is a continuous variable too. To get lsmeans of Y according to A,B and C, respectively, in SAS, I tried proc glm data=a; class A B C; model Y=A B C d; lsmeans A B C/cl; run; In R, I tried this: library(Design) ddist-datadist(a) options(datadist=ddist) f-ols(Y~A+B+C+D,data=a,x=TRUE,y=TRUE,se.fit=TRUE) then how to get the lsmeans for A, B, and C, respectively with predict function? Best wishes yours, sincerely Xingwang Ye PhD candidate Research Group of Nutrition Related Cancers and Other Chronic Diseases Institute for Nutritional Sciences, Shanghai Institutes of Biological Sciences, Chinese Academy of Sciences P.O.Box 32 294 Taiyuan Road Shanghai 200031 P.R.CHINA -- Notice: This e-mail message, together with any attachments,...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE