On 26 Mar 2004 16:20:20 -0800, [EMAIL PROTECTED] (Bin Zhou) wrote: > Hello, all, > > I am using Stepwise multiple linear regression. The model has 5 > independent variables, and number of sample is 210. > I found the following suggestion today. > > ************************************************************************ > Stepwise regression is used in the exploratory phase of research or > for purposes of pure prediction,, not theory testing. In the theory > testing stage the researcher should base selection of the variables > and their order on theory, not on a computer algorithm. Menard (1995: > 54) writes, "there appears to be general agreement that the use of > computer-controlled stepwise procedures to select variables is > inappropriate for theory testing because it capitalizes on random > variations in the data and produces results that tend to be > idosyncratic and difficult to replicate in any sample other than the > sample in which they were originally obtained." Likewise, the nominal > .05 significance level used at each step in stepwise regression is > subject to inflation, such that the real significance level by the > last step may be much worse, even below .50, dramatically increasing > the chances of Type I errors. See Draper, N.R., Guttman, I. & Lapczak, > L. (1979). For this reason, Fox (1991: 18) strongly recommends any > stepwise model be subjected to cross-validation. > ************************************************************************* > > I know cross-validation is useful for neural network. I am not sure I > have to use it in MLR, because my linear model need "overfitting" for > all samples. I donot like cross validation, maybe the main reason is > that I have not software do it. So I wonder if you can give me a > excuse so that I do not need cross validation. In practice, is it > popular method that stepwise model is subjected to cross-validation? >
That was good advice, above, except for the short word about cross-validation, which was (in my opinion) optimistic. Where stepwise is not a good idea, cross-validation cannot do much to salvage it. Why do you have the set of 5 variables? Why not report them, including a (possibly) small coefficient or two? - The worse results arise, perhaps, when the '5 variables' were winnowed out of a much larger set, since that creates biases that cross-validation cannot salvage. The reputation of 'stepwise' is not very high these days. I suggest you should try to avoid that style of analysis. I do agree with that bottom line if Fox: If you do insist on stepwise, and you want to make some *point* about "which variables are chosen," then you imply better stability if you show that one answer is consistent. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html - I need a new job, after March 31. Openings? - . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
