- about that article -  concerning flaws in the article.

On Mon, 22 Mar 2004 12:34:07 -0500, Eugene Gallagher 
[ snip, our earlier posts about the article in Science, a stepwise 
regression that first picked Nitrogen deposition, in predicting
number-of-species.]
>
> I wasn't making the claim that stepwise was superior. I was trying to 
> understand why the authors specified 'stepwise' as the method if not to 
> convey to the reader that their method was objective
> In the brief supplementary pdf which is available online for this 
> article, the authors state:
> 'Stepwise multiple regression was used to create the models between 
> plant species richness (mean number of species per quadrat) and 
> potential environmental drivers (Table S1). Multiple regression analysis 
> assumes that the relationship between X and Y variables is linear, the 
> scale of the variability of the Y values is constant at all values of X, 
> and the errors are independently and normally distributed. These 
> assumptions were examined by residual plots and no violations of the 
> assumptions were found (Wilks-Shapiro statistic, p<0.04, variance 
> inflation factors < 2). We also tested and rejected that the regression 
> was skewed by outliers. The same model resulted regardless of the order 
> in which the variables were entered."

Unless I am misreading Conover, a W-S test with p< 0.04  
indicates 'violation of the assumption' of normality.

> 
>    Now to me, the paragraph seems unlikely to be a complete description 
> of the issues involved. The first three potential explanatory variables 
> listed in their Table S1 include:
> Total Nitrogen deposition (kg N ha^-1 y^-1)           Ndep
> Total deposition NH_3 + NH_4+ (kg N ha^-1 y^-1)       N-red
> Total deposition NO + NO2 + NO3- (kg N ha^-1 y^-1)    N-ox
> 
> This is equivalent to having the variables
> 1) red+ox
> 2) red
> 3) ox

Is that really what they have?  I see what you mean 
when you say, 'unlikely to be a complete description.'
If (1)  is the exact sum, I guess your program would 
avoid the Variance-inflation  by refusing to enter the
full set.


> In the paper, only total nitrogen deposition makes it into their 
> reported regression equations [Note that Ndep is a prediction from an 
> atmospheric model; it isn't measured at these sites throughout Britain.] 

If Ndep  is a prediction-from-a-model, not measured
locally, I wonder how independent those predictions 
happen to be.  It seems to me that (if it wasn't there before)
there is a strong potential for geographical confounding.
 - I have read the web-located article now.  The sites were spread
across England.  Do side-by-side sites share the Ndep 
amount?  Do they also share geography or climate as basis
for number of species? - I could ask explicitly about altitude, 
rainfall, and average temperature.

The model was built on data from 68 sites.  The authors
started with 20 predictors.  That means that the R^2  
'expected by chance'  for a full model was about 30%, so 
they did exceed that crude number by a good margin.
However, geography has been a suspicious factor for 
models in epidemiology, for a long time.  It is a source
of 'specious correlation'  rather similar to the effect of
time series.  Do the Appalachian highlands have high rates
heart disease because of local mineral water? bad heredity?
bad health care? bad air?

If the geography effectively reduces the source of variation
to a half-dozen areas, rather than "68 (independent) sites",
then the degrees of freedom are potentially reduced, for 
other variables with geographical distributions.  To, say,
as low as a half dozen?  That, in turn, means that even 
an R^2  of  0.70  might not be very impressive, as the "best"
correlation among a bunch of potential predictors.

> I don't see how the 'same model resulted' regardless of the order in 
> which the variables were entered [if N-red and N-ox were entered 1st, 
> Ndep would be unlikely to be selected unless I'm missing a form of N]. 
> Perhaps there is a 3rd form on nitrogen in rain that I'm missing. I 
> don't think urea or other forms of organic nitrogen would be a major 
> factor in rain, but I'm an oceanographer, not an atmospheric chemist.

One form of N in particles, important in the U.S.  pollution
debate about acid rain, is expected to come from high-
temperature combustion.  I assumed that the article had 
some face-validity by following the line of that debate, but
now I don't know whether they were supporting that, or not.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html
 - I need a new job, after March 31.  Openings? -
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to