On Sun, 1 Feb 2004 19:13:49 -0800 (PST) Jinsong Zhao <[EMAIL PROTECTED]> wrote:
> > --- Frank E Harrell Jr <[EMAIL PROTECTED]> wrote: > > On Sun, 1 Feb 2004 11:09:28 -0800 (PST) > > Jinsong Zhao <[EMAIL PROTECTED]> wrote: > > > > > Dear all, > > > > > > I am a newcomer to R. I intend to using R to do > > > stepwise regression and PLS with a data set (a > > 55x20 > > > matrix, with one dependent and 19 independent > > > variable). Based on the same data set, I have done > > the > > > same work using SPSS and SAS. However, there is > > much > > > difference between the results obtained by R and > > SPSS > > > or SAS. > > > > > > In the case of stepwise, SPSS gave out a model > > with 4 > > > independent variable, but with step(), R gave out > > a > > > model with 10 and much higher R2. Furthermore, > > > regsubsets() also indicate the 10 variable is one > > of > > > the best regression subset. How to explain this > > > difference? And in the case of my data set, how > > many > > > variables that enter the model would be > > reasonable? > > > > > > In the case of PLS, the results of mvr function of > > > pls.pcr package is also different with that of > > SAS. > > > Although the number of optimum latent variables is > > > same, the difference between R2 is much large. > > Why? > > > > > > Any comment and suggestion is very appreciated. > > Thanks > > > in advance! > > > > > > Best wishes, > > > > > > Jinsong Zhao > > > > > > > In your case SPSS, SAS, R, S-Plus, Stata, Systat, > > Statistica, and every > > other package will agree in one sense, because > > results from all of them > > will be virtually meaningless. Simulate some data > > from a known model and > > you'll quickly find out why stepwise variable > > selection is often a train > > wreck. > > > > --- > > Frank E Harrell Jr Professor and Chair > > School of Medicine > > Department of Biostatistics > > Vanderbilt University > > For the case of stepwise regression, I have found that > the subsets I got using regsubsets() are collinear. > However, the variables in SPSS's result are not > collinear. I wonder what I should do to get a same or > better linear model. I think you missed the point. None of the variable selection procedures will provide results that have a fair probability of replicating in another sample. FH --- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html