Re: [R] How to get LSMEANS from linear mixed model?

2007-04-24 Thread Chuck Cleland
Yannan Jiang wrote:
 Hi there,
 
  
 
 I am trying to run simulations using R with linear mixed model (lme). There
 are two factors in my fixed effect model, educ (treatment and control) and
 mth (visit 1, 2, and 3). What I want to obtain is the estimated treatment
 difference (treatment - control) at visit 3, plus the standard error and
 p-value. This can be easily obtained in SAS using lsmeans or estimate
 statements, but I am not sure how to do this in R. 
 
  
 
 The fixed effects I obtained are as follows:
 
  
 
 Fixed effects: ymth ~ educ * mth - 1 
 
 Value   Std.Error
 DFt-value   p-value
 
 educcont   0.14814308   0.006232419  93
 23.769758 0.
 
 eductreat   0.13696952   0.006255672  93
 21.895254 0.
 
 mthymth2  0.3759   0.006333043  165
 0.005936   0.9953
 
 mthymth3  0.01075489   0.006251328  165
 1.720416   0.0872
 
 eductreat:mthymth2   0.00323847   0.008947291  165
 0.361950   0.7179
 
 eductreat:mthymth3   -0.01246565  0.008941306  165
 -1.394164  0.1651
 
  
 
  
 
 The estimated treatment difference I am interested are:
 
  
 
  a-0.14814308+0.01075489 
 
  b-0.13696952+0.01075489-0.01246565
 
 b-a
 
 [1] -0.02363921   (treatment effect at visit 3, same as SAS lsmean output)
 
  
 
 But I don't know how to get the standard error and corresponding p-value for
 this estimate. Any of your helps on that would be greatly appreciated! 

  How about fitting the model this way?

df$mth - relevel(df$mth, ref = ymth3)
lme(ymth ~ educ * mth, random = ~ 1 | id, data = df)

  The coefficient for educ will contain the simple effect of educ at
mth=ymth3, along with a standard error and p-value.

 Thanks,
 
 Yannan
 
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code. 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to get lsmeans?

2007-03-23 Thread Muenchen, Robert A (Bob)
The Exegesis paper gave me a great look at the history of all this. I
had not been aware that S-PLUS had gone that route. There is much to be
said for knowing you might be more successful but sticking to your
perspective instead. And in the long run, that may be the more
successful route anyway. 

Thanks,
Bob

=
Bob Muenchen (pronounced Min'-chen), Manager 
Statistical Consulting Center
U of TN Office of Information Technology
200 Stokely Management Center, Knoxville, TN 37996-0520
Voice: (865) 974-5230 
FAX: (865) 974-4810
Email: [EMAIL PROTECTED]
Web: http://oit.utk.edu/scc, 
News: http://listserv.utk.edu/archives/statnews.html
=

 -Original Message-
 From: Liaw, Andy [mailto:[EMAIL PROTECTED]
 Sent: Thursday, March 22, 2007 5:27 PM
 To: Douglas Bates; Muenchen, Robert A (Bob)
 Cc: R-help@stat.math.ethz.ch
 Subject: RE: [R] how to get lsmeans?
 
 From: Douglas Bates
 
  On 3/22/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote:
 
   Perhaps I'm stating the obvious, but to increase the use of R in
   places where SAS  SPSS dominate, it's important to make
  getting the
   same answers as easy as possible. That includes things like
lsmeans
   and type III sums of squares. I've read lots of discussions here
on
   sums of squares  I'm not advocating type III use, just
  looking at it
   from a marketing perspective. Too many people look for
  excuses to not change.
   The fewer excuses, the better.
 
  You may get strong reactions to such a suggestion.  I
  recommend reading Bill Venables' famous unpublished paper
  Exegeses on linear models (google for the title - very few
  people use Exegeses and linear models in the same
  sentence - in fact I would not be surprised if Bill was the
  only one who has ever done so).
 
 It's on the MASS page:
 http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf
 I believe it's based on a talk Bill gave at a S-PLUS User's
Conference.
 I think it deserves to be required reading for all graduate level
 linear
 models course.
 
  You must realize that R is written by experts in statistics
  and statistical computing who, despite popular opinion, do
  not believe that everything in SAS and SPSS is worth copying.
   Some things done in such packages, which trace their roots
  back to the days of punched cards and magnetic tape when
  fitting a single linear model may take several days because
  your first 5 attempts failed due to syntax errors in the JCL
  or the SAS code, still reflect the approach of give me every
  possible statistic that could be calculated from this model,
  whether or not it makes sense.  The approach taken in R is
 different.
   The underlying assumption is that the useR is thinking about
  the analysis while doing it.
 
  The fact that it is so difficult to explain what lsmeans are
  and why they would be of interest is an indication of why
  they aren't implemented in any of the required packages.
 
 Perhaps I should have made it clear in my original post:  I gave the
 example and code more to show what the mysterious least squares
means
 are (which John explained lucidly), than how to replicate what SAS (or
 JMP) outputs.  I do not understand how people can feel comfortable
 reporting things like lsmeans and p-values from type insert your
 favorite Roman numeral here tests when they do not know how such
 things
 arise or, at the very least, what they _really_ mean.  (Given how
 simple
 lsmeans are computed, not knowing how to compute them is pretty much
 the
 same as not knowing what they are.)  One of the dangers of wholesale
 output as SAS or SPSS gives is for the user to simply pick an answer
 and
 run with it, without understanding what that answer is, or if it
 corresponds to the question of interest.
 
 As to whether to weight the levels of the factors being held constant,
 my suggestion to John would be to offer both choices (unweighted and
 weighted by observed frequencies).  I can see why one would want to
 weight by observed frequencies (if the data are sampled from a
 population), but there are certainly situations (perhaps more often
 than
 not in the cases I've encountered) that the observed frequencies do
not
 come close to approximating what they are in the population.  In such
 cases the unweighted average would make more sense to me.
 
 Cheers,
 Andy
 
 
-Original Message-
From: [EMAIL PROTECTED] [mailto:r-help-
[EMAIL PROTECTED] On Behalf Of John Fox
Sent: Wednesday, March 21, 2007 8:59 PM
To: 'Prof Brian Ripley'
Cc: 'r-help'; 'Chuck Cleland'
Subject: Re: [R] how to get lsmeans?
   
Dear Brian et al.,
   
My apologies for chiming in late: It's been a busy day.
   
First some general comments on least-squares means and effect
displays.
The general idea behind the two is similar -- to examine fitted
values corresponding to a term in a model while holding

Re: [R] how to get lsmeans?

2007-03-22 Thread John Fox
Dear Brian et al.,

My apologies for chiming in late: It's been a busy day.

First some general comments on least-squares means and effect displays.
The general idea behind the two is similar -- to examine fitted values
corresponding to a term in a model while holding other terms to typical
values -- but the implementation is not identical. There are also other
similar ideas floating around as well. My formulation is more general in the
sense that it applies to a wider variety of models, both linear and
otherwise.

Least-squares means (a horrible term, by the way: in a 1980 paper in the
American Statistician, Searle, Speed, and Milliken suggested the more
descriptive term population marginal means) apply to factors and
combinations of factors; covariates are set to mean values and the levels of
other factors are averaged over, in effect applying equal weight to each
level. (This is from memory, so it's possible that I'm not getting it quite
right, but I believe that I am.) In my effect displays, each level of a
factor is weighted by its proportion in the data. In models in which
least-squares means can be computed, they should differ from the
corresponding effect display by a constant (if there are different numbers
of observations in the different levels of the factors that are held
constant).

The obstacle to computing either least-squares means or effect displays in R
via predict() is that predict() wants factors in the new data to be set to
particular levels. The effect() function in the effects package bypasses
predict() and works directly with the model matrix, averaging over the
columns that pertain to a factor (and reconstructing interactions as
necessary). As mentioned, this has the effect of setting the factor to its
proportional distribution in the data. This approach also has the advantage
of being invariant with respect to the choice of contrasts for a factor.

The only convenient way that I can think of to implement least-squares means
in R would be to use deviation-coded regressors for a factor (that is,
contr.sum) and then to set the columns of the model matrix for the factor(s)
to be averaged over to 0. It may just be that I'm having a failure of
imagination and that there's a better way to proceed. I've not implemented
this solution because it is dependent upon the choice of contrasts and
because I don't see a general advantage to it, but since the issue has come
up several times now, maybe I should take a crack at it. Remember that I
want this to work more generally, not just for levels of factors, and not
just for linear models.

Brian is quite right in mentioning that he suggested some time ago that I
use critical values of t rather than of the standard normal distribution for
producing confidence intervals, and I agree that it makes sense to do so in
models in which the dispersion is estimated. My only excuse for not yet
doing this is that I want to undertake a more general revision of the
effects package, and haven't had time to do it. There are several changes
that I'd like to make to the package. For example, I have results for
multinomial and proportional odds logit models (described in a paper by me
and Bob Andersen in the 2006 issue of Sociological Methodology) that I want
to incorporate, and I'd like to improve the appearance of the default
graphs. But Brian's suggestion is very straightforward, and I guess that I
shouldn't wait to implement it; I'll do so very soon.

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Prof 
 Brian Ripley
 Sent: Wednesday, March 21, 2007 12:03 PM
 To: Chuck Cleland
 Cc: r-help
 Subject: Re: [R] how to get lsmeans?
 
 On Wed, 21 Mar 2007, Chuck Cleland wrote:
 
  Liaw, Andy wrote:
  I verified the result from the following with output from JMP 6 on 
  the same data (don't have SAS: don't need it):
 
  set.seed(631)
  n - 100
  dat - data.frame(y=rnorm(n), A=factor(sample(1:2, n, 
 replace=TRUE)),
B=factor(sample(1:2, n, replace=TRUE)),
C=factor(sample(1:2, n, replace=TRUE)),
d=rnorm(n))
  fm - lm(y ~ A + B + C + d, dat)
  ## Form a data frame of points to predict: all 
 combinations of the ## 
  three factors and the mean of the covariate.
  p - data.frame(expand.grid(A=1:2, B=1:2, C=1:2)) p[] - lapply(p, 
  factor) p - cbind(p, d=mean(dat$d)) p - 
 cbind(yhat=predict(fm, p), 
  p) ## lsmeans for the three factors:
  with(p, tapply(yhat, A, mean))
  with(p, tapply(yhat, B, mean))
  with(p, tapply(yhat, C, mean))
 
   Using Andy's example data, these are the LSMEANS and 
 intervals I get 
  from SAS:
 
  Ay LSMEAN  95% Confidence Limits
  1   -0.071847   -0.387507 0.243813
  2   -0.029621

Re: [R] how to get lsmeans?

2007-03-22 Thread Muenchen, Robert A (Bob)
Hi All,

Perhaps I'm stating the obvious, but to increase the use of R in places
where SAS  SPSS dominate, it's important to make getting the same
answers as easy as possible. That includes things like lsmeans and type
III sums of squares. I've read lots of discussions here on sums of
squares  I'm not advocating type III use, just looking at it from a
marketing perspective. Too many people look for excuses to not change.
The fewer excuses, the better.

Of course this is easy for me to say, as I'm not the one who does the
work! Much thanks to those who do.

Cheers,
Bob

=
  Bob Muenchen (pronounced Min'-chen), Manager  
  Statistical Consulting Center
  U of TN Office of Information Technology
  200 Stokely Management Center, Knoxville, TN 37996-0520
  Voice: (865) 974-5230  
  FAX:   (865) 974-4810
  Email: [EMAIL PROTECTED]
  Web:   http://oit.utk.edu/scc, 
  News:  http://listserv.utk.edu/archives/statnews.html
=

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:r-help-
 [EMAIL PROTECTED] On Behalf Of John Fox
 Sent: Wednesday, March 21, 2007 8:59 PM
 To: 'Prof Brian Ripley'
 Cc: 'r-help'; 'Chuck Cleland'
 Subject: Re: [R] how to get lsmeans?
 
 Dear Brian et al.,
 
 My apologies for chiming in late: It's been a busy day.
 
 First some general comments on least-squares means and effect
 displays.
 The general idea behind the two is similar -- to examine fitted values
 corresponding to a term in a model while holding other terms to
typical
 values -- but the implementation is not identical. There are also
other
 similar ideas floating around as well. My formulation is more general
 in the
 sense that it applies to a wider variety of models, both linear and
 otherwise.
 
 Least-squares means (a horrible term, by the way: in a 1980 paper in
 the
 American Statistician, Searle, Speed, and Milliken suggested the more
 descriptive term population marginal means) apply to factors and
 combinations of factors; covariates are set to mean values and the
 levels of
 other factors are averaged over, in effect applying equal weight to
 each
 level. (This is from memory, so it's possible that I'm not getting it
 quite
 right, but I believe that I am.) In my effect displays, each level of
a
 factor is weighted by its proportion in the data. In models in which
 least-squares means can be computed, they should differ from the
 corresponding effect display by a constant (if there are different
 numbers
 of observations in the different levels of the factors that are held
 constant).
 
 The obstacle to computing either least-squares means or effect
displays
 in R
 via predict() is that predict() wants factors in the new data to be
 set to
 particular levels. The effect() function in the effects package
 bypasses
 predict() and works directly with the model matrix, averaging over the
 columns that pertain to a factor (and reconstructing interactions as
 necessary). As mentioned, this has the effect of setting the factor to
 its
 proportional distribution in the data. This approach also has the
 advantage
 of being invariant with respect to the choice of contrasts for a
 factor.
 
 The only convenient way that I can think of to implement least-squares
 means
 in R would be to use deviation-coded regressors for a factor (that is,
 contr.sum) and then to set the columns of the model matrix for the
 factor(s)
 to be averaged over to 0. It may just be that I'm having a failure of
 imagination and that there's a better way to proceed. I've not
 implemented
 this solution because it is dependent upon the choice of contrasts and
 because I don't see a general advantage to it, but since the issue has
 come
 up several times now, maybe I should take a crack at it. Remember that
 I
 want this to work more generally, not just for levels of factors, and
 not
 just for linear models.
 
 Brian is quite right in mentioning that he suggested some time ago
that
 I
 use critical values of t rather than of the standard normal
 distribution for
 producing confidence intervals, and I agree that it makes sense to do
 so in
 models in which the dispersion is estimated. My only excuse for not
yet
 doing this is that I want to undertake a more general revision of the
 effects package, and haven't had time to do it. There are several
 changes
 that I'd like to make to the package. For example, I have results for
 multinomial and proportional odds logit models (described in a paper
by
 me
 and Bob Andersen in the 2006 issue of Sociological Methodology) that I
 want
 to incorporate, and I'd like to improve the appearance of the default
 graphs. But Brian's suggestion is very straightforward, and I guess
 that I
 shouldn't wait to implement it; I'll do so very soon.
 
 Regards,
  John
 
 
 John Fox
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604

Re: [R] how to get lsmeans?

2007-03-22 Thread Ioannis Dimakos
I believe there is an easier way and you don't have to do a thing.

A student of mine just purchased a brand new laptop loaded with Vista. 
Guess what?  SPSS v14 or v15 does not work with vista.  In a couple of
days we will install ubuntu on it and R.

Enjoy,

Ioannis


On Πεμ, Μάρτιος 22, 2007 15:35, Muenchen, Robert A (Bob) wrote:
 Hi All,

 Perhaps I'm stating the obvious, but to increase the use of R in places
 where SAS  SPSS dominate, it's important to make getting the same
 answers as easy as possible. That includes things like lsmeans and type
 III sums of squares. I've read lots of discussions here on sums of
 squares  I'm not advocating type III use, just looking at it from a
 marketing perspective. Too many people look for excuses to not change.
 The fewer excuses, the better.

 Of course this is easy for me to say, as I'm not the one who does the
 work! Much thanks to those who do.

 Cheers,
 Bob

 =
   Bob Muenchen (pronounced Min'-chen), Manager
   Statistical Consulting Center
   U of TN Office of Information Technology
   200 Stokely Management Center, Knoxville, TN 37996-0520
   Voice: (865) 974-5230
   FAX:   (865) 974-4810
   Email: [EMAIL PROTECTED]
   Web:   http://oit.utk.edu/scc,
   News:  http://listserv.utk.edu/archives/statnews.html
 =

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:r-help-
 [EMAIL PROTECTED] On Behalf Of John Fox
 Sent: Wednesday, March 21, 2007 8:59 PM
 To: 'Prof Brian Ripley'
 Cc: 'r-help'; 'Chuck Cleland'
 Subject: Re: [R] how to get lsmeans?

 Dear Brian et al.,

 My apologies for chiming in late: It's been a busy day.

 First some general comments on least-squares means and effect
 displays.
 The general idea behind the two is similar -- to examine fitted values
 corresponding to a term in a model while holding other terms to
 typical
 values -- but the implementation is not identical. There are also
 other
 similar ideas floating around as well. My formulation is more general
 in the
 sense that it applies to a wider variety of models, both linear and
 otherwise.

 Least-squares means (a horrible term, by the way: in a 1980 paper in
 the
 American Statistician, Searle, Speed, and Milliken suggested the more
 descriptive term population marginal means) apply to factors and
 combinations of factors; covariates are set to mean values and the
 levels of
 other factors are averaged over, in effect applying equal weight to
 each
 level. (This is from memory, so it's possible that I'm not getting it
 quite
 right, but I believe that I am.) In my effect displays, each level of
 a
 factor is weighted by its proportion in the data. In models in which
 least-squares means can be computed, they should differ from the
 corresponding effect display by a constant (if there are different
 numbers
 of observations in the different levels of the factors that are held
 constant).

 The obstacle to computing either least-squares means or effect
 displays
 in R
 via predict() is that predict() wants factors in the new data to be
 set to
 particular levels. The effect() function in the effects package
 bypasses
 predict() and works directly with the model matrix, averaging over the
 columns that pertain to a factor (and reconstructing interactions as
 necessary). As mentioned, this has the effect of setting the factor to
 its
 proportional distribution in the data. This approach also has the
 advantage
 of being invariant with respect to the choice of contrasts for a
 factor.

 The only convenient way that I can think of to implement least-squares
 means
 in R would be to use deviation-coded regressors for a factor (that is,
 contr.sum) and then to set the columns of the model matrix for the
 factor(s)
 to be averaged over to 0. It may just be that I'm having a failure of
 imagination and that there's a better way to proceed. I've not
 implemented
 this solution because it is dependent upon the choice of contrasts and
 because I don't see a general advantage to it, but since the issue has
 come
 up several times now, maybe I should take a crack at it. Remember that
 I
 want this to work more generally, not just for levels of factors, and
 not
 just for linear models.

 Brian is quite right in mentioning that he suggested some time ago
 that
 I
 use critical values of t rather than of the standard normal
 distribution for
 producing confidence intervals, and I agree that it makes sense to do
 so in
 models in which the dispersion is estimated. My only excuse for not
 yet
 doing this is that I want to undertake a more general revision of the
 effects package, and haven't had time to do it. There are several
 changes
 that I'd like to make to the package. For example, I have results for
 multinomial and proportional odds logit models (described in a paper
 by
 me
 and Bob Andersen in the 2006 issue of Sociological Methodology) that I
 want

Re: [R] how to get lsmeans?

2007-03-22 Thread John Fox
Dear Bob,

I think that you make a valid point -- and I've included Type-III tests in
the Anova() function in the car package, for example, though not entirely
for consistency with SAS -- but my object in writing the effects package was
different. I wanted to provide a general approach to visualizing terms in
complex models with linear predictors. If I can with reasonable effort
provide a solution that's the same as least-squares means for models for
which least-squares means are defined then I'll do so at some point, but
duplicating what SAS does was not my goal.

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Muenchen, Robert A (Bob)
 Sent: Thursday, March 22, 2007 9:36 AM
 To: R-help@stat.math.ethz.ch
 Subject: Re: [R] how to get lsmeans?
 
 Hi All,
 
 Perhaps I'm stating the obvious, but to increase the use of R 
 in places where SAS  SPSS dominate, it's important to make 
 getting the same answers as easy as possible. That includes 
 things like lsmeans and type III sums of squares. I've read 
 lots of discussions here on sums of squares  I'm not 
 advocating type III use, just looking at it from a marketing 
 perspective. Too many people look for excuses to not change.
 The fewer excuses, the better.
 
 Of course this is easy for me to say, as I'm not the one who 
 does the work! Much thanks to those who do.
 
 Cheers,
 Bob
 
 =
   Bob Muenchen (pronounced Min'-chen), Manager
   Statistical Consulting Center
   U of TN Office of Information Technology
   200 Stokely Management Center, Knoxville, TN 37996-0520
   Voice: (865) 974-5230  
   FAX:   (865) 974-4810
   Email: [EMAIL PROTECTED]
   Web:   http://oit.utk.edu/scc, 
   News:  http://listserv.utk.edu/archives/statnews.html
 =
 
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:r-help- 
  [EMAIL PROTECTED] On Behalf Of John Fox
  Sent: Wednesday, March 21, 2007 8:59 PM
  To: 'Prof Brian Ripley'
  Cc: 'r-help'; 'Chuck Cleland'
  Subject: Re: [R] how to get lsmeans?
  
  Dear Brian et al.,
  
  My apologies for chiming in late: It's been a busy day.
  
  First some general comments on least-squares means and effect 
  displays.
  The general idea behind the two is similar -- to examine 
 fitted values 
  corresponding to a term in a model while holding other terms to
 typical
  values -- but the implementation is not identical. There are also
 other
  similar ideas floating around as well. My formulation is 
 more general 
  in the sense that it applies to a wider variety of models, 
 both linear 
  and otherwise.
  
  Least-squares means (a horrible term, by the way: in a 
 1980 paper in 
  the American Statistician, Searle, Speed, and Milliken 
 suggested the 
  more descriptive term population marginal means) apply to factors 
  and combinations of factors; covariates are set to mean 
 values and the 
  levels of other factors are averaged over, in effect applying equal 
  weight to each level. (This is from memory, so it's 
 possible that I'm 
  not getting it quite right, but I believe that I am.) In my effect 
  displays, each level of
 a
  factor is weighted by its proportion in the data. In models 
 in which 
  least-squares means can be computed, they should differ from the 
  corresponding effect display by a constant (if there are different 
  numbers of observations in the different levels of the factors that 
  are held constant).
  
  The obstacle to computing either least-squares means or effect
 displays
  in R
  via predict() is that predict() wants factors in the new 
 data to be 
  set to particular levels. The effect() function in the 
 effects package 
  bypasses
  predict() and works directly with the model matrix, 
 averaging over the 
  columns that pertain to a factor (and reconstructing 
 interactions as 
  necessary). As mentioned, this has the effect of setting 
 the factor to 
  its proportional distribution in the data. This approach 
 also has the 
  advantage of being invariant with respect to the choice of 
 contrasts 
  for a factor.
  
  The only convenient way that I can think of to implement 
 least-squares 
  means in R would be to use deviation-coded regressors for a factor 
  (that is,
  contr.sum) and then to set the columns of the model matrix for the
  factor(s)
  to be averaged over to 0. It may just be that I'm having a 
 failure of 
  imagination and that there's a better way to proceed. I've not 
  implemented this solution because it is dependent upon the 
 choice of 
  contrasts and because I don't see a general advantage to 
 it, but since 
  the issue has come up several times now, maybe I should 
 take

Re: [R] how to get lsmeans?

2007-03-22 Thread Xingwang Ye
Dear John, 
 
As a green hand of R, I feel that if the results are identical to the ones of 
the user's familiar softwares such as
SAS, SPSS or Stata etc,the user will feel confident about the results of R. 
Why? Because lots of R beginners have no 
strong statistical background, just like me, they chose to trust SAS or SPSS 
firstly.
 
The best solution is that R should not only have the capability to produce the 
identical results of SAS or SPSS,
but also should have many more powerful, more prefessional functions(packages).
Then more and more guys who are not special at statistics will enjoy R, which 
makes R
popular; and more and more statisticians like it also, which makes R 
prefessional. 

Obviously, R is on the load to success.


Best wishes. 
yours, sincerely,  
Xingwang Ye
PhD candidate 
Research Group of Nutrition Related Cancers and Other Chronic Diseases  
Institute for Nutritional Sciences,  
Shanghai Institutes of Biological Sciences, 
Chinese Academy of Sciences 

Dear Bob, 
 
I think that you make a valid point -- and I've included Type-III tests in 
the Anova() function in the car package, for example, though not entirely 
for consistency with SAS -- but my object in writing the effects package was 
different. I wanted to provide a general approach to visualizing terms in 
complex models with linear predictors. If I can with reasonable effort 
provide a solution that's the same as least-squares means for models for 
which least-squares means are defined then I'll do so at some point, but 
duplicating what SAS does was not my goal. 
 
Regards, 
 John 
 
 
John Fox 
Department of Sociology 
McMaster University 
Hamilton, Ontario 
Canada L8S 4M4 
905-525-9140x23604 
http://socserv.mcmaster.ca/jfox  
  
 
 -Original Message- 
 From: [EMAIL PROTECTED]  
 [mailto:[EMAIL PROTECTED] On Behalf Of  
 Muenchen, Robert A (Bob) 
 Sent: Thursday, March 22, 2007 9:36 AM 
 To: R-help@stat.math.ethz.ch 
 Subject: Re: [R] how to get lsmeans? 
  
 Hi All, 
  
 Perhaps I'm stating the obvious, but to increase the use of R  
 in places where SAS  SPSS dominate, it's important to make  
 getting the same answers as easy as possible. That includes  
 things like lsmeans and type III sums of squares. I've read  
 lots of discussions here on sums of squares  I'm not  
 advocating type III use, just looking at it from a marketing  
 perspective. Too many people look for excuses to not change. 
 The fewer excuses, the better. 
  
 Of course this is easy for me to say, as I'm not the one who  
 does the work! Much thanks to those who do. 
  
 Cheers, 
 Bob 
  
 = 
   Bob Muenchen (pronounced Min'-chen), Manager 
   Statistical Consulting Center 
   U of TN Office of Information Technology 
   200 Stokely Management Center, Knoxville, TN 37996-0520 
   Voice: (865) 974-5230   
   FAX:   (865) 974-4810 
   Email: [EMAIL PROTECTED] 
   Web:   http://oit.utk.edu/scc,  
   News:  http://listserv.utk.edu/archives/statnews.html 
 = 
  
  -Original Message- 
  From: [EMAIL PROTECTED] [mailto:r-help-  
  [EMAIL PROTECTED] On Behalf Of John Fox 
  Sent: Wednesday, March 21, 2007 8:59 PM 
  To: 'Prof Brian Ripley' 
  Cc: 'r-help'; 'Chuck Cleland' 
  Subject: Re: [R] how to get lsmeans? 
   
  Dear Brian et al., 
   
  My apologies for chiming in late: It's been a busy day. 
   
  First some general comments on least-squares means and effect  
  displays. 
  The general idea behind the two is similar -- to examine  
 fitted values  
  corresponding to a term in a model while holding other terms to 
 typical 
  values -- but the implementation is not identical. There are also 
 other 
  similar ideas floating around as well. My formulation is  
 more general  
  in the sense that it applies to a wider variety of models,  
 both linear  
  and otherwise. 
   
  Least-squares means (a horrible term, by the way: in a  
 1980 paper in  
  the American Statistician, Searle, Speed, and Milliken  
 suggested the  
  more descriptive term population marginal means) apply to factors  
  and combinations of factors; covariates are set to mean  
 values and the  
  levels of other factors are averaged over, in effect applying equal  
  weight to each level. (This is from memory, so it's  
 possible that I'm  
  not getting it quite right, but I believe that I am.) In my effect  
  displays, each level of 
 a 
  factor is weighted by its proportion in the data. In models  
 in which  
  least-squares means can be computed, they should differ from the  
  corresponding effect display by a constant (if there are different  
  numbers of observations in the different levels of the factors that  
  are held constant). 
   
  The obstacle to computing either least-squares means or effect 
 displays 
  in R 
  via predict

Re: [R] how to get lsmeans?

2007-03-22 Thread John Fox
Dear Frank,

 -Original Message-
 From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, March 22, 2007 10:51 AM
 To: John Fox
 Cc: 'Prof Brian Ripley'; 'r-help'; 'Chuck Cleland'; 'Liaw, Andy'
 Subject: Re: [R] how to get lsmeans?
 
 John Fox wrote:
  Dear Frank,
  
  The object is to focus on a high-order term of the model, holding 
  other terms constant at typical values; in the case of a 
 factor, a 
  typical value is ambiguous, so an average is taken over 
 the levels of the factor.
  If the factor is, e.g., gender, one could produce an 
 estimate of the 
  average fitted value for a population composed equally of men and 
  women, or of a population composed of men and women in 
 proportion to 
  their distribution in the data. Otherwise, one would have 
 to produce 
  separate sets of fitted values for men and women, with the 
 number of 
  such sets increasing as the number of levels of the factors held 
  constant increase. On the scale of the linear predictor, 
 these sets would differ only by constants.
  
  Regards,
   John
 
 Makes sense.  I just set other factors to the mode, and if it 
 is important to see estimates for other categories, I give 
 estimates for each factor level.  If I want to uncondition on 
 a variable (not often) I take the proper weighted average of 
 predicted values.
 

The mode seens arbitrary to me, but I don't think that there's a unique
right answer here. The weighted average (using sample proportions) is what
the effect() function does.

Regards,
 John

 Cheers
 Frank
 
  
  
  John Fox
  Department of Sociology
  McMaster University
  Hamilton, Ontario
  Canada L8S 4M4
  905-525-9140x23604
  http://socserv.mcmaster.ca/jfox
  
  
  -Original Message-
  From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED]
  Sent: Thursday, March 22, 2007 8:41 AM
  To: John Fox
  Cc: 'Prof Brian Ripley'; 'r-help'; 'Chuck Cleland'; 'Liaw, Andy'
  Subject: Re: [R] how to get lsmeans?
 
  John Fox wrote:
  Dear Brian et al.,
 
  My apologies for chiming in late: It's been a busy day.
 
  First some general comments on least-squares means and
  effect displays.
  ... much good stuff deleted ...
 
  John - the one thing I didn't get from your post is a 
 motivation to 
  do all that as opposed to easy-to-explain predicted 
 values.  I would 
  appreciate your thoughts.
 
  Thanks
  Frank
 
  -- 
  Frank E Harrell Jr   Professor and Chair   School 
 of Medicine
Department of Biostatistics   
  Vanderbilt University
  
  
 
 
 -- 
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   
 Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to get lsmeans?

2007-03-22 Thread John Fox
Dear Frank,

The object is to focus on a high-order term of the model, holding other
terms constant at typical values; in the case of a factor, a typical
value is ambiguous, so an average is taken over the levels of the factor.
If the factor is, e.g., gender, one could produce an estimate of the average
fitted value for a population composed equally of men and women, or of a
population composed of men and women in proportion to their distribution in
the data. Otherwise, one would have to produce separate sets of fitted
values for men and women, with the number of such sets increasing as the
number of levels of the factors held constant increase. On the scale of the
linear predictor, these sets would differ only by constants.

Regards,
 John


John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
 

 -Original Message-
 From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, March 22, 2007 8:41 AM
 To: John Fox
 Cc: 'Prof Brian Ripley'; 'r-help'; 'Chuck Cleland'; 'Liaw, Andy'
 Subject: Re: [R] how to get lsmeans?
 
 John Fox wrote:
  Dear Brian et al.,
  
  My apologies for chiming in late: It's been a busy day.
  
  First some general comments on least-squares means and 
 effect displays.
 ... much good stuff deleted ...
 
 John - the one thing I didn't get from your post is a 
 motivation to do all that as opposed to easy-to-explain 
 predicted values.  I would appreciate your thoughts.
 
 Thanks
 Frank
 
 -- 
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   
 Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to get lsmeans?

2007-03-22 Thread Frank E Harrell Jr
John Fox wrote:
 Dear Brian et al.,
 
 My apologies for chiming in late: It's been a busy day.
 
 First some general comments on least-squares means and effect displays.
... much good stuff deleted ...

John - the one thing I didn't get from your post is a motivation to do 
all that as opposed to easy-to-explain predicted values.  I would 
appreciate your thoughts.

Thanks
Frank

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to get lsmeans?

2007-03-22 Thread Frank E Harrell Jr
John Fox wrote:
 Dear Frank,
 
 The object is to focus on a high-order term of the model, holding other
 terms constant at typical values; in the case of a factor, a typical
 value is ambiguous, so an average is taken over the levels of the factor.
 If the factor is, e.g., gender, one could produce an estimate of the average
 fitted value for a population composed equally of men and women, or of a
 population composed of men and women in proportion to their distribution in
 the data. Otherwise, one would have to produce separate sets of fitted
 values for men and women, with the number of such sets increasing as the
 number of levels of the factors held constant increase. On the scale of the
 linear predictor, these sets would differ only by constants.
 
 Regards,
  John

Makes sense.  I just set other factors to the mode, and if it is 
important to see estimates for other categories, I give estimates for 
each factor level.  If I want to uncondition on a variable (not often) I 
take the proper weighted average of predicted values.

Cheers
Frank

 
 
 John Fox
 Department of Sociology
 McMaster University
 Hamilton, Ontario
 Canada L8S 4M4
 905-525-9140x23604
 http://socserv.mcmaster.ca/jfox 
  
 
 -Original Message-
 From: Frank E Harrell Jr [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, March 22, 2007 8:41 AM
 To: John Fox
 Cc: 'Prof Brian Ripley'; 'r-help'; 'Chuck Cleland'; 'Liaw, Andy'
 Subject: Re: [R] how to get lsmeans?

 John Fox wrote:
 Dear Brian et al.,

 My apologies for chiming in late: It's been a busy day.

 First some general comments on least-squares means and 
 effect displays.
 ... much good stuff deleted ...

 John - the one thing I didn't get from your post is a 
 motivation to do all that as opposed to easy-to-explain 
 predicted values.  I would appreciate your thoughts.

 Thanks
 Frank

 -- 
 Frank E Harrell Jr   Professor and Chair   School of Medicine
   Department of Biostatistics   
 Vanderbilt University
 
 


-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to get lsmeans?

2007-03-22 Thread Douglas Bates
On 3/22/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote:

 Perhaps I'm stating the obvious, but to increase the use of R in places
 where SAS  SPSS dominate, it's important to make getting the same
 answers as easy as possible. That includes things like lsmeans and type
 III sums of squares. I've read lots of discussions here on sums of
 squares  I'm not advocating type III use, just looking at it from a
 marketing perspective. Too many people look for excuses to not change.
 The fewer excuses, the better.

You may get strong reactions to such a suggestion.  I recommend
reading Bill Venables' famous unpublished paper Exegeses on linear
models (google for the title - very few people use Exegeses and
linear models in the same sentence - in fact I would not be
surprised if Bill was the only one who has ever done so).

You must realize that R is written by experts in statistics and
statistical computing who, despite popular opinion, do not believe
that everything in SAS and SPSS is worth copying.  Some things done in
such packages, which trace their roots back to the days of punched
cards and magnetic tape when fitting a single linear model may take
several days because your first 5 attempts failed due to syntax errors
in the JCL or the SAS code, still reflect the approach of give me
every possible statistic that could be calculated from this model,
whether or not it makes sense.  The approach taken in R is different.
 The underlying assumption is that the useR is thinking about the
analysis while doing it.

The fact that it is so difficult to explain what lsmeans are and why
they would be of interest is an indication of why they aren't
implemented in any of the required packages.

  -Original Message-
  From: [EMAIL PROTECTED] [mailto:r-help-
  [EMAIL PROTECTED] On Behalf Of John Fox
  Sent: Wednesday, March 21, 2007 8:59 PM
  To: 'Prof Brian Ripley'
  Cc: 'r-help'; 'Chuck Cleland'
  Subject: Re: [R] how to get lsmeans?
 
  Dear Brian et al.,
 
  My apologies for chiming in late: It's been a busy day.
 
  First some general comments on least-squares means and effect
  displays.
  The general idea behind the two is similar -- to examine fitted values
  corresponding to a term in a model while holding other terms to
 typical
  values -- but the implementation is not identical. There are also
 other
  similar ideas floating around as well. My formulation is more general
  in the
  sense that it applies to a wider variety of models, both linear and
  otherwise.
 
  Least-squares means (a horrible term, by the way: in a 1980 paper in
  the
  American Statistician, Searle, Speed, and Milliken suggested the more
  descriptive term population marginal means) apply to factors and
  combinations of factors; covariates are set to mean values and the
  levels of
  other factors are averaged over, in effect applying equal weight to
  each
  level. (This is from memory, so it's possible that I'm not getting it
  quite
  right, but I believe that I am.) In my effect displays, each level of
 a
  factor is weighted by its proportion in the data. In models in which
  least-squares means can be computed, they should differ from the
  corresponding effect display by a constant (if there are different
  numbers
  of observations in the different levels of the factors that are held
  constant).
 
  The obstacle to computing either least-squares means or effect
 displays
  in R
  via predict() is that predict() wants factors in the new data to be
  set to
  particular levels. The effect() function in the effects package
  bypasses
  predict() and works directly with the model matrix, averaging over the
  columns that pertain to a factor (and reconstructing interactions as
  necessary). As mentioned, this has the effect of setting the factor to
  its
  proportional distribution in the data. This approach also has the
  advantage
  of being invariant with respect to the choice of contrasts for a
  factor.
 
  The only convenient way that I can think of to implement least-squares
  means
  in R would be to use deviation-coded regressors for a factor (that is,
  contr.sum) and then to set the columns of the model matrix for the
  factor(s)
  to be averaged over to 0. It may just be that I'm having a failure of
  imagination and that there's a better way to proceed. I've not
  implemented
  this solution because it is dependent upon the choice of contrasts and
  because I don't see a general advantage to it, but since the issue has
  come
  up several times now, maybe I should take a crack at it. Remember that
  I
  want this to work more generally, not just for levels of factors, and
  not
  just for linear models.
 
  Brian is quite right in mentioning that he suggested some time ago
 that
  I
  use critical values of t rather than of the standard normal
  distribution for
  producing confidence intervals, and I agree that it makes sense to do
  so in
  models in which the dispersion is estimated. My only excuse

Re: [R] how to get lsmeans?

2007-03-22 Thread Liaw, Andy
From: Douglas Bates
 
 On 3/22/07, Muenchen, Robert A (Bob) [EMAIL PROTECTED] wrote:
 
  Perhaps I'm stating the obvious, but to increase the use of R in 
  places where SAS  SPSS dominate, it's important to make 
 getting the 
  same answers as easy as possible. That includes things like lsmeans 
  and type III sums of squares. I've read lots of discussions here on 
  sums of squares  I'm not advocating type III use, just 
 looking at it 
  from a marketing perspective. Too many people look for 
 excuses to not change.
  The fewer excuses, the better.
 
 You may get strong reactions to such a suggestion.  I 
 recommend reading Bill Venables' famous unpublished paper 
 Exegeses on linear models (google for the title - very few 
 people use Exegeses and linear models in the same 
 sentence - in fact I would not be surprised if Bill was the 
 only one who has ever done so).

It's on the MASS page:
http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf
I believe it's based on a talk Bill gave at a S-PLUS User's Conference.
I think it deserves to be required reading for all graduate level linear
models course.
 
 You must realize that R is written by experts in statistics 
 and statistical computing who, despite popular opinion, do 
 not believe that everything in SAS and SPSS is worth copying. 
  Some things done in such packages, which trace their roots 
 back to the days of punched cards and magnetic tape when 
 fitting a single linear model may take several days because 
 your first 5 attempts failed due to syntax errors in the JCL 
 or the SAS code, still reflect the approach of give me every 
 possible statistic that could be calculated from this model, 
 whether or not it makes sense.  The approach taken in R is different.
  The underlying assumption is that the useR is thinking about 
 the analysis while doing it.
 
 The fact that it is so difficult to explain what lsmeans are 
 and why they would be of interest is an indication of why 
 they aren't implemented in any of the required packages.

Perhaps I should have made it clear in my original post:  I gave the
example and code more to show what the mysterious least squares means
are (which John explained lucidly), than how to replicate what SAS (or
JMP) outputs.  I do not understand how people can feel comfortable
reporting things like lsmeans and p-values from type insert your
favorite Roman numeral here tests when they do not know how such things
arise or, at the very least, what they _really_ mean.  (Given how simple
lsmeans are computed, not knowing how to compute them is pretty much the
same as not knowing what they are.)  One of the dangers of wholesale
output as SAS or SPSS gives is for the user to simply pick an answer and
run with it, without understanding what that answer is, or if it
corresponds to the question of interest.

As to whether to weight the levels of the factors being held constant,
my suggestion to John would be to offer both choices (unweighted and
weighted by observed frequencies).  I can see why one would want to
weight by observed frequencies (if the data are sampled from a
population), but there are certainly situations (perhaps more often than
not in the cases I've encountered) that the observed frequencies do not
come close to approximating what they are in the population.  In such
cases the unweighted average would make more sense to me.

Cheers,
Andy

 
   -Original Message-
   From: [EMAIL PROTECTED] [mailto:r-help- 
   [EMAIL PROTECTED] On Behalf Of John Fox
   Sent: Wednesday, March 21, 2007 8:59 PM
   To: 'Prof Brian Ripley'
   Cc: 'r-help'; 'Chuck Cleland'
   Subject: Re: [R] how to get lsmeans?
  
   Dear Brian et al.,
  
   My apologies for chiming in late: It's been a busy day.
  
   First some general comments on least-squares means and effect 
   displays.
   The general idea behind the two is similar -- to examine fitted 
   values corresponding to a term in a model while holding 
 other terms 
   to
  typical
   values -- but the implementation is not identical. There are also
  other
   similar ideas floating around as well. My formulation is more 
   general in the sense that it applies to a wider variety 
 of models, 
   both linear and otherwise.
  
   Least-squares means (a horrible term, by the way: in a 
 1980 paper 
   in the American Statistician, Searle, Speed, and Milliken 
 suggested 
   the more descriptive term population marginal means) apply to 
   factors and combinations of factors; covariates are set to mean 
   values and the levels of other factors are averaged over, 
 in effect 
   applying equal weight to each level. (This is from 
 memory, so it's 
   possible that I'm not getting it quite right, but I 
 believe that I 
   am.) In my effect displays, each level of
  a
   factor is weighted by its proportion in the data. In 
 models in which 
   least-squares means can be computed, they should differ from the 
   corresponding effect display by a constant (if there are 
 different

Re: [R] how to get lsmeans?

2007-03-21 Thread Liaw, Andy
I verified the result from the following with output from JMP 6 on the
same data (don't have SAS: don't need it):

set.seed(631)
n - 100
dat - data.frame(y=rnorm(n), A=factor(sample(1:2, n, replace=TRUE)),
  B=factor(sample(1:2, n, replace=TRUE)),
  C=factor(sample(1:2, n, replace=TRUE)),
  d=rnorm(n))
fm - lm(y ~ A + B + C + d, dat)
## Form a data frame of points to predict: all combinations of the
## three factors and the mean of the covariate.
p - data.frame(expand.grid(A=1:2, B=1:2, C=1:2))
p[] - lapply(p, factor)
p - cbind(p, d=mean(dat$d))
p - cbind(yhat=predict(fm, p), p)
## lsmeans for the three factors:
with(p, tapply(yhat, A, mean))
with(p, tapply(yhat, B, mean))
with(p, tapply(yhat, C, mean))

Andy 

From: Xingwang Ye
 
 Dear all, 
   
 I search the mail list about this topic and learn that no 
 simple way is available to get lsmeans in R as in SAS.
 Dr.John Fox and Dr.Frank E Harrell have given very useful 
 information about lsmeans topic.
 Dr. Frank E Harrell suggests not to think about lsmeans, 
 just to think about what predicted values wanted
 and to use the predict function. However, after reading 
 the R help file for a whole day, I am still unclear how to do it.
 Could some one give me a hand? 
  
 for example:
   
 A,B and C are binomial variables(factors); d is a continuous 
 variable ; The response variable Y is  a continuous variable too.  
 
 To get lsmeans of Y according to A,B and C, respectively, in 
 SAS, I tried proc glm data=a;  class A B C;  model Y=A B C d; 
  lsmeans A B C/cl; run;  
 
 In R, I tried this:  
  library(Design)
  ddist-datadist(a)
  options(datadist=ddist)
  f-ols(Y~A+B+C+D,data=a,x=TRUE,y=TRUE,se.fit=TRUE)  
 
 then how to get the lsmeans for A, B, and C, respectively 
 with predict function?
 
  
 
 Best wishes
 yours, sincerely 
 Xingwang Ye
 PhD candidate 
 Research Group of Nutrition Related Cancers and Other Chronic 
 Diseases  
 Institute for Nutritional Sciences,  
 Shanghai Institutes of Biological Sciences, 
 Chinese Academy of Sciences 
 P.O.Box 32 
 294 Taiyuan Road 
 Shanghai 200031 
 P.R.CHINA
 
 


--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to get lsmeans?

2007-03-21 Thread Chuck Cleland
Liaw, Andy wrote:
 I verified the result from the following with output from JMP 6 on the
 same data (don't have SAS: don't need it):
 
 set.seed(631)
 n - 100
 dat - data.frame(y=rnorm(n), A=factor(sample(1:2, n, replace=TRUE)),
   B=factor(sample(1:2, n, replace=TRUE)),
   C=factor(sample(1:2, n, replace=TRUE)),
   d=rnorm(n))
 fm - lm(y ~ A + B + C + d, dat)
 ## Form a data frame of points to predict: all combinations of the
 ## three factors and the mean of the covariate.
 p - data.frame(expand.grid(A=1:2, B=1:2, C=1:2))
 p[] - lapply(p, factor)
 p - cbind(p, d=mean(dat$d))
 p - cbind(yhat=predict(fm, p), p)
 ## lsmeans for the three factors:
 with(p, tapply(yhat, A, mean))
 with(p, tapply(yhat, B, mean))
 with(p, tapply(yhat, C, mean))

  Using Andy's example data, these are the LSMEANS and intervals I get
from SAS:

Ay LSMEAN  95% Confidence Limits
1   -0.071847   -0.387507 0.243813
2   -0.029621   -0.342358 0.283117

By LSMEAN  95% Confidence Limits
1   -0.104859   -0.397935 0.188216
20.003391   -0.333476 0.340258

Cy LSMEAN  95% Confidence Limits
1   -0.084679   -0.392343 0.222986
2   -0.016789   -0.336374 0.302795

  One way of reproducing the LSMEANS and intervals from SAS using
predict() seems to be the following:

 dat.lm - lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat)
 newdat - expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d))
 cbind(newdat, predict(dat.lm, newdat, interval=confidence))
  A   B   C  d fitlwr   upr
1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128
2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165

  However, another possibility seems to be:

 dat.lm - lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat)
 newdat -
expand.grid(A=factor(c(1,2)),B=mean(as.numeric(dat$B)),C=mean(as.numeric(dat$C)),d=mean(dat$d))
 cbind(newdat, predict(dat.lm, newdat, interval=confidence))
  ABC  d fitlwr   upr
1 1 1.43 1.48 0.09838595 -0.08078243 -0.3964661 0.2349012
2 2 1.43 1.48 0.09838595 -0.03855619 -0.3479589 0.2708465

  The predictions directly above match what effect() in the effects
package by John Fox returns:

library(effects)

 effect(A, fm, xlevels=list(d = mean(dat$D)))

 A effect
A
  1   2
-0.08078243 -0.03855619

  But for some reason the predict() and effect() intervals are a little
different:

 effect(A, fm, xlevels=list(d = mean(dat$D)))$lower
  [,1]
101 -0.3924451
102 -0.3440179

 effect(A, fm, xlevels=list(d = mean(dat$D)))$upper
 [,1]
101 0.2308802
102 0.2669055

  I would be interested in any comments on these different approaches
and on the difference in intervals returned by predict() and effect().

hope this helps,

Chuck

 Andy 
 
 From: Xingwang Ye
 Dear all, 
   
 I search the mail list about this topic and learn that no 
 simple way is available to get lsmeans in R as in SAS.
 Dr.John Fox and Dr.Frank E Harrell have given very useful 
 information about lsmeans topic.
 Dr. Frank E Harrell suggests not to think about lsmeans, 
 just to think about what predicted values wanted
 and to use the predict function. However, after reading 
 the R help file for a whole day, I am still unclear how to do it.
 Could some one give me a hand? 
  
 for example:
   
 A,B and C are binomial variables(factors); d is a continuous 
 variable ; The response variable Y is  a continuous variable too.  

 To get lsmeans of Y according to A,B and C, respectively, in 
 SAS, I tried proc glm data=a;  class A B C;  model Y=A B C d; 
  lsmeans A B C/cl; run;  

 In R, I tried this:  
  library(Design)
  ddist-datadist(a)
  options(datadist=ddist)
  f-ols(Y~A+B+C+D,data=a,x=TRUE,y=TRUE,se.fit=TRUE)  

 then how to get the lsmeans for A, B, and C, respectively 
 with predict function?

  

 Best wishes
 yours, sincerely 
 Xingwang Ye
 PhD candidate 
 Research Group of Nutrition Related Cancers and Other Chronic 
 Diseases  
 Institute for Nutritional Sciences,  
 Shanghai Institutes of Biological Sciences, 
 Chinese Academy of Sciences 
 P.O.Box 32 
 294 Taiyuan Road 
 Shanghai 200031 
 P.R.CHINA


 
 
 --
 Notice:  This e-mail message, together with any attachments,...{{dropped}}
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

__

Re: [R] how to get lsmeans?

2007-03-21 Thread Prof Brian Ripley
On Wed, 21 Mar 2007, Chuck Cleland wrote:

 Liaw, Andy wrote:
 I verified the result from the following with output from JMP 6 on the
 same data (don't have SAS: don't need it):

 set.seed(631)
 n - 100
 dat - data.frame(y=rnorm(n), A=factor(sample(1:2, n, replace=TRUE)),
   B=factor(sample(1:2, n, replace=TRUE)),
   C=factor(sample(1:2, n, replace=TRUE)),
   d=rnorm(n))
 fm - lm(y ~ A + B + C + d, dat)
 ## Form a data frame of points to predict: all combinations of the
 ## three factors and the mean of the covariate.
 p - data.frame(expand.grid(A=1:2, B=1:2, C=1:2))
 p[] - lapply(p, factor)
 p - cbind(p, d=mean(dat$d))
 p - cbind(yhat=predict(fm, p), p)
 ## lsmeans for the three factors:
 with(p, tapply(yhat, A, mean))
 with(p, tapply(yhat, B, mean))
 with(p, tapply(yhat, C, mean))

  Using Andy's example data, these are the LSMEANS and intervals I get
 from SAS:

 Ay LSMEAN  95% Confidence Limits
 1   -0.071847   -0.387507 0.243813
 2   -0.029621   -0.342358 0.283117

 By LSMEAN  95% Confidence Limits
 1   -0.104859   -0.397935 0.188216
 20.003391   -0.333476 0.340258

 Cy LSMEAN  95% Confidence Limits
 1   -0.084679   -0.392343 0.222986
 2   -0.016789   -0.336374 0.302795

  One way of reproducing the LSMEANS and intervals from SAS using
 predict() seems to be the following:

 dat.lm - lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat)
 newdat - expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d))
 cbind(newdat, predict(dat.lm, newdat, interval=confidence))
  A   B   C  d fitlwr   upr
 1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128
 2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165

  However, another possibility seems to be:

 dat.lm - lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat)
 newdat -
 expand.grid(A=factor(c(1,2)),B=mean(as.numeric(dat$B)),C=mean(as.numeric(dat$C)),d=mean(dat$d))
 cbind(newdat, predict(dat.lm, newdat, interval=confidence))
  ABC  d fitlwr   upr
 1 1 1.43 1.48 0.09838595 -0.08078243 -0.3964661 0.2349012
 2 2 1.43 1.48 0.09838595 -0.03855619 -0.3479589 0.2708465

  The predictions directly above match what effect() in the effects
 package by John Fox returns:

 library(effects)

 effect(A, fm, xlevels=list(d = mean(dat$D)))

 A effect
 A
  1   2
 -0.08078243 -0.03855619

  But for some reason the predict() and effect() intervals are a little
 different:

 effect(A, fm, xlevels=list(d = mean(dat$D)))$lower
  [,1]
 101 -0.3924451
 102 -0.3440179

 effect(A, fm, xlevels=list(d = mean(dat$D)))$upper
 [,1]
 101 0.2308802
 102 0.2669055

  I would be interested in any comments on these different approaches
 and on the difference in intervals returned by predict() and effect().

AFAIR, the effects packages uses normal-based confidence intervals 
and predict.lm uses t-based ones, and I have suggested to John Fox that 
t-based intervals would be preferable, at least as an option.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to get lsmeans?

2007-03-21 Thread Frank E Harrell Jr
Chuck Cleland wrote:
 Liaw, Andy wrote:
 I verified the result from the following with output from JMP 6 on the
 same data (don't have SAS: don't need it):

 set.seed(631)
 n - 100
 dat - data.frame(y=rnorm(n), A=factor(sample(1:2, n, replace=TRUE)),
   B=factor(sample(1:2, n, replace=TRUE)),
   C=factor(sample(1:2, n, replace=TRUE)),
   d=rnorm(n))
 fm - lm(y ~ A + B + C + d, dat)
 ## Form a data frame of points to predict: all combinations of the
 ## three factors and the mean of the covariate.
 p - data.frame(expand.grid(A=1:2, B=1:2, C=1:2))
 p[] - lapply(p, factor)
 p - cbind(p, d=mean(dat$d))
 p - cbind(yhat=predict(fm, p), p)
 ## lsmeans for the three factors:
 with(p, tapply(yhat, A, mean))
 with(p, tapply(yhat, B, mean))
 with(p, tapply(yhat, C, mean))

And with the Design package you can do:

f - ols(y ~ ...)
ds - gendata(A=c('1','2'),B=c('1','2'),C=c('1','2'))
predict(f, ds)

But this sets the covariate to the median (nothing unreasonable about 
that, just will not agree with SAS).  To set to mean add d=mean(dat$d) 
in gendata.

Frank

 
   Using Andy's example data, these are the LSMEANS and intervals I get
 from SAS:
 
 Ay LSMEAN  95% Confidence Limits
 1   -0.071847   -0.387507 0.243813
 2   -0.029621   -0.342358 0.283117
 
 By LSMEAN  95% Confidence Limits
 1   -0.104859   -0.397935 0.188216
 20.003391   -0.333476 0.340258
 
 Cy LSMEAN  95% Confidence Limits
 1   -0.084679   -0.392343 0.222986
 2   -0.016789   -0.336374 0.302795
 
   One way of reproducing the LSMEANS and intervals from SAS using
 predict() seems to be the following:
 
 dat.lm - lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat)
 newdat - expand.grid(A=factor(c(1,2)),B=1.5,C=1.5,d=mean(dat$d))
 cbind(newdat, predict(dat.lm, newdat, interval=confidence))
   A   B   C  d fitlwr   upr
 1 1 1.5 1.5 0.09838595 -0.07184709 -0.3875070 0.2438128
 2 2 1.5 1.5 0.09838595 -0.02962086 -0.3423582 0.2831165
 
   However, another possibility seems to be:
 
 dat.lm - lm(y ~ A + as.numeric(B) + as.numeric(C) + d, data = dat)
 newdat -
 expand.grid(A=factor(c(1,2)),B=mean(as.numeric(dat$B)),C=mean(as.numeric(dat$C)),d=mean(dat$d))
 cbind(newdat, predict(dat.lm, newdat, interval=confidence))
   ABC  d fitlwr   upr
 1 1 1.43 1.48 0.09838595 -0.08078243 -0.3964661 0.2349012
 2 2 1.43 1.48 0.09838595 -0.03855619 -0.3479589 0.2708465
 
   The predictions directly above match what effect() in the effects
 package by John Fox returns:
 
 library(effects)
 
 effect(A, fm, xlevels=list(d = mean(dat$D)))
 
  A effect
 A
   1   2
 -0.08078243 -0.03855619
 
   But for some reason the predict() and effect() intervals are a little
 different:
 
 effect(A, fm, xlevels=list(d = mean(dat$D)))$lower
   [,1]
 101 -0.3924451
 102 -0.3440179
 
 effect(A, fm, xlevels=list(d = mean(dat$D)))$upper
  [,1]
 101 0.2308802
 102 0.2669055
 
   I would be interested in any comments on these different approaches
 and on the difference in intervals returned by predict() and effect().
 
 hope this helps,
 
 Chuck
 
 Andy 

 From: Xingwang Ye
 Dear all, 
   
 I search the mail list about this topic and learn that no 
 simple way is available to get lsmeans in R as in SAS.
 Dr.John Fox and Dr.Frank E Harrell have given very useful 
 information about lsmeans topic.
 Dr. Frank E Harrell suggests not to think about lsmeans, 
 just to think about what predicted values wanted
 and to use the predict function. However, after reading 
 the R help file for a whole day, I am still unclear how to do it.
 Could some one give me a hand? 
  
 for example:
   
 A,B and C are binomial variables(factors); d is a continuous 
 variable ; The response variable Y is  a continuous variable too.  

 To get lsmeans of Y according to A,B and C, respectively, in 
 SAS, I tried proc glm data=a;  class A B C;  model Y=A B C d; 
  lsmeans A B C/cl; run;  

 In R, I tried this:  
  library(Design)
  ddist-datadist(a)
  options(datadist=ddist)
  f-ols(Y~A+B+C+D,data=a,x=TRUE,y=TRUE,se.fit=TRUE)  

 then how to get the lsmeans for A, B, and C, respectively 
 with predict function?

  

 Best wishes
 yours, sincerely 
 Xingwang Ye
 PhD candidate 
 Research Group of Nutrition Related Cancers and Other Chronic 
 Diseases  
 Institute for Nutritional Sciences,  
 Shanghai Institutes of Biological Sciences, 
 Chinese Academy of Sciences 
 P.O.Box 32 
 294 Taiyuan Road 
 Shanghai 200031 
 P.R.CHINA



 --
 Notice:  This e-mail message, together with any attachments,...{{dropped}}

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE