On Sat, 20 Mar 2004 19:19:22 +0000 (UTC), [EMAIL PROTECTED] wrote: > Eugene Gallagher <[EMAIL PROTECTED]> wrote: > > Rich, > > I tend to agree with you about the potential abuse of stepwise > > multiple regression. However, it is widely used and I wouldn't label a > > study using stepwise as being of necessity flawed, even if the goal was > > to evaluate the relative importance of different explanatory variables. > > For example, this week's Science has an article that has been widely > > reported in the popular press and the key analysis is a stepwise > > regression. One way of interpreting the paper is that the authors used > > stepwise to make their assessment of the importance of N deposition more > > objective. They didn't pick N deposition, the computer did. > > Mike Babyak > > Not having seen the paper, I can't comment on the specifics of the application of > stewpise > there. But certainly, given the simulation literature on stepwise, the fact that > Science > published such a paper only shows that they aren't aware of the problems with the > procedure. > The good intent or reputation of a journal or scientist still won't make the > procedure > better. In most situations scientists encounter, there isn't "potential abuse", > there's just > pretty much by definition a badly overfitted model. >
Mike says that well. Still, I might be a *little* less harsh. I haven't see the paper, either. Here is some more of what Gene posted -- "Of 20 variables measured to account for the variability in species richness, total deposition of inorganic N (Ndep, kg N ha�1 y�1) was the most important predictor, explaining more than half of the variation in the number of species per quadrat (Fig. 2A and Eq. 1).... " After accounting for N deposition, mean annual precipitation (MAP, mm) explained an additional 8% of variability in species richness. A further 5% was explained by the A horizon soil pH (Top pH, Fig. 2B) and 3% by altitude (Alt, m). In total, 70% of the variability in species richness could be explained by these four variables: ... " Stepwise, I have said before, can give you a shorter list of variables when you have a list where everything matters. Especially, it can give you the *first* variable, if one of them stands out from the others. In the above, Deposition does account for a huge share of variance; what is unstated (here, at least) is whether any of the other (presumably correlated) measures were anywhere close to that fraction, univariate. Stepwise if *famous* for being really lousy at giving you the number two and three and four when the relative shares of Variance are (for instance) 54, 8, 5, and 3. If they were searching for 'explanation' rather than a shorter prediction equation, then the authors stumbled badly -- if the stepwise result is all they relied on. Again, I have not see the paper, so I want my aspersions to be read as being somewhat hypothetical, or as being cast against the worst-case scenario. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html - I need a new job, after March 31. Openings? - . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
