On Sun, 1 Feb 2004 11:09:28 -0800 (PST) Jinsong Zhao <[EMAIL PROTECTED]> wrote:
> Dear all, > > I am a newcomer to R. I intend to using R to do > stepwise regression and PLS with a data set (a 55x20 > matrix, with one dependent and 19 independent > variable). Based on the same data set, I have done the > same work using SPSS and SAS. However, there is much > difference between the results obtained by R and SPSS > or SAS. > > In the case of stepwise, SPSS gave out a model with 4 > independent variable, but with step(), R gave out a > model with 10 and much higher R2. Furthermore, > regsubsets() also indicate the 10 variable is one of > the best regression subset. How to explain this > difference? And in the case of my data set, how many > variables that enter the model would be reasonable? > > In the case of PLS, the results of mvr function of > pls.pcr package is also different with that of SAS. > Although the number of optimum latent variables is > same, the difference between R2 is much large. Why? > > Any comment and suggestion is very appreciated. Thanks > in advance! > > Best wishes, > > Jinsong Zhao > In your case SPSS, SAS, R, S-Plus, Stata, Systat, Statistica, and every other package will agree in one sense, because results from all of them will be virtually meaningless. Simulate some data from a known model and you'll quickly find out why stepwise variable selection is often a train wreck. --- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html