Rich Ulrich wrote (in part)

<<<
That was good advice, above, except for the short word
about cross-validation, which was (in my opinion) optimistic. 
Where stepwise is not a good idea, cross-validation cannot 
do much to salvage it.  

Why do you have the set of 5 variables?  Why not report
them, including a (possibly) small coefficient or two?
 - The worse results arise, perhaps, when the '5 variables' 
were winnowed out of a much larger set, since that creates 
biases that cross-validation cannot salvage.

The reputation of 'stepwise' is not very high these days.
I suggest you should try to avoid that style of analysis.
>>>

I certainly agree that avoiding stepwise is a good idea; in fact, I
agree with all of what Rich wrote above.

One sort of crossvalidation I have done (to demonstrate to others that
stepwise is, in fact, a bad idea) is to divide the data set into 2 equal
parts on somethig like id number, then run the same stepwise twice. 
THis is, of course, not full cross validation, but it's easy to do in
any software, and easy to explain.

I've done this with a number of data sets, and  have yet to have the
same model emerge from the two halves - often, they are very different.

OTOH, if the two halves DID give the same model, I'd have more
confidence in it.  But, in this case, the model is probably obvious.


HTH

Peter

Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)


.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to