On 26 Mar 2004 16:20:20 -0800, [EMAIL PROTECTED] (Bin Zhou)
wrote:

> Hello, all,
> 
> I am using Stepwise multiple linear regression. The model has 5
> independent variables, and number of sample is 210.
> I found the following suggestion today.
> 
> ************************************************************************
> Stepwise regression is used in the exploratory phase of research or
> for purposes of pure prediction,, not theory testing. In the theory
> testing stage the researcher should base selection of the variables
> and their order on theory, not on a computer algorithm. Menard (1995:
> 54) writes, "there appears to be general agreement that the use of
> computer-controlled stepwise procedures to select variables is
> inappropriate for theory testing because it capitalizes on random
> variations in the data and produces results that tend to be
> idosyncratic and difficult to replicate in any sample other than the
> sample in which they were originally obtained." Likewise, the nominal
> .05 significance level used at each step in stepwise regression is
> subject to inflation, such that the real significance level by the
> last step may be much worse, even below .50, dramatically increasing
> the chances of Type I errors. See Draper, N.R., Guttman, I. & Lapczak,
> L. (1979). For this reason, Fox (1991: 18) strongly recommends any
> stepwise model be subjected to cross-validation.
> *************************************************************************
> 
> I know cross-validation is useful for neural network. I am not sure I
> have to use it in MLR, because my linear model need "overfitting" for
> all samples.  I donot like cross validation, maybe the main reason is
> that I have not software do it. So I wonder if you can give me a
> excuse so that I do not need cross validation. In practice, is it
> popular method that stepwise model is subjected to cross-validation?
> 

That was good advice, above, except for the short word
about cross-validation, which was (in my opinion) optimistic. 
Where stepwise is not a good idea, cross-validation cannot 
do much to salvage it.  

Why do you have the set of 5 variables?  Why not report
them, including a (possibly) small coefficient or two?
 - The worse results arise, perhaps, when the '5 variables' 
were winnowed out of a much larger set, since that creates 
biases that cross-validation cannot salvage.

The reputation of 'stepwise' is not very high these days.
I suggest you should try to avoid that style of analysis.

I do agree with that bottom line if Fox:  If you do 
insist on stepwise, and you want to make some *point*
about  "which variables are chosen,"  then you imply 
better stability if you show that one answer is consistent.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
 - I need a new job, after March 31.  Openings? -
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to