- about that article - concerning flaws in the article. On Mon, 22 Mar 2004 12:34:07 -0500, Eugene Gallagher [ snip, our earlier posts about the article in Science, a stepwise regression that first picked Nitrogen deposition, in predicting number-of-species.] > > I wasn't making the claim that stepwise was superior. I was trying to > understand why the authors specified 'stepwise' as the method if not to > convey to the reader that their method was objective > In the brief supplementary pdf which is available online for this > article, the authors state: > 'Stepwise multiple regression was used to create the models between > plant species richness (mean number of species per quadrat) and > potential environmental drivers (Table S1). Multiple regression analysis > assumes that the relationship between X and Y variables is linear, the > scale of the variability of the Y values is constant at all values of X, > and the errors are independently and normally distributed. These > assumptions were examined by residual plots and no violations of the > assumptions were found (Wilks-Shapiro statistic, p<0.04, variance > inflation factors < 2). We also tested and rejected that the regression > was skewed by outliers. The same model resulted regardless of the order > in which the variables were entered."
Unless I am misreading Conover, a W-S test with p< 0.04 indicates 'violation of the assumption' of normality. > > Now to me, the paragraph seems unlikely to be a complete description > of the issues involved. The first three potential explanatory variables > listed in their Table S1 include: > Total Nitrogen deposition (kg N ha^-1 y^-1) Ndep > Total deposition NH_3 + NH_4+ (kg N ha^-1 y^-1) N-red > Total deposition NO + NO2 + NO3- (kg N ha^-1 y^-1) N-ox > > This is equivalent to having the variables > 1) red+ox > 2) red > 3) ox Is that really what they have? I see what you mean when you say, 'unlikely to be a complete description.' If (1) is the exact sum, I guess your program would avoid the Variance-inflation by refusing to enter the full set. > In the paper, only total nitrogen deposition makes it into their > reported regression equations [Note that Ndep is a prediction from an > atmospheric model; it isn't measured at these sites throughout Britain.] If Ndep is a prediction-from-a-model, not measured locally, I wonder how independent those predictions happen to be. It seems to me that (if it wasn't there before) there is a strong potential for geographical confounding. - I have read the web-located article now. The sites were spread across England. Do side-by-side sites share the Ndep amount? Do they also share geography or climate as basis for number of species? - I could ask explicitly about altitude, rainfall, and average temperature. The model was built on data from 68 sites. The authors started with 20 predictors. That means that the R^2 'expected by chance' for a full model was about 30%, so they did exceed that crude number by a good margin. However, geography has been a suspicious factor for models in epidemiology, for a long time. It is a source of 'specious correlation' rather similar to the effect of time series. Do the Appalachian highlands have high rates heart disease because of local mineral water? bad heredity? bad health care? bad air? If the geography effectively reduces the source of variation to a half-dozen areas, rather than "68 (independent) sites", then the degrees of freedom are potentially reduced, for other variables with geographical distributions. To, say, as low as a half dozen? That, in turn, means that even an R^2 of 0.70 might not be very impressive, as the "best" correlation among a bunch of potential predictors. > I don't see how the 'same model resulted' regardless of the order in > which the variables were entered [if N-red and N-ox were entered 1st, > Ndep would be unlikely to be selected unless I'm missing a form of N]. > Perhaps there is a 3rd form on nitrogen in rain that I'm missing. I > don't think urea or other forms of organic nitrogen would be a major > factor in rain, but I'm an oceanographer, not an atmospheric chemist. One form of N in particles, important in the U.S. pollution debate about acid rain, is expected to come from high- temperature combustion. I assumed that the article had some face-validity by following the line of that debate, but now I don't know whether they were supporting that, or not. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html - I need a new job, after March 31. Openings? - . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
