Rich Ulrich wrote (in part) <<< That was good advice, above, except for the short word about cross-validation, which was (in my opinion) optimistic. Where stepwise is not a good idea, cross-validation cannot do much to salvage it.
Why do you have the set of 5 variables? Why not report them, including a (possibly) small coefficient or two? - The worse results arise, perhaps, when the '5 variables' were winnowed out of a much larger set, since that creates biases that cross-validation cannot salvage. The reputation of 'stepwise' is not very high these days. I suggest you should try to avoid that style of analysis. >>> I certainly agree that avoiding stepwise is a good idea; in fact, I agree with all of what Rich wrote above. One sort of crossvalidation I have done (to demonstrate to others that stepwise is, in fact, a bad idea) is to divide the data set into 2 equal parts on somethig like id number, then run the same stepwise twice. THis is, of course, not full cross validation, but it's easy to do in any software, and easy to explain. I've done this with a number of data sets, and have yet to have the same model emerge from the two halves - often, they are very different. OTOH, if the two halves DID give the same model, I'd have more confidence in it. But, in this case, the model is probably obvious. HTH Peter Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St www.peterflom.com New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax) . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
