Re: standard deviation of a slope

2006-08-17 Thread Stephen B. Cox
Hi Geoff - just have a quick minute.. so, I'll hazard a response without
thinking about it too much :)

On 8/16/06, Geoffrey Poole  [EMAIL PROTECTED] wrote:


 Doesn't sqrt(SSx) increase with n?  If so, won't the standard error of
 the slope decrease with increasing sample size??


Yes - the standard error of the slope will decrease with increasing sample
size.



 I realize SE of estimate and SE of slope do not represent the same
 thing, statistically, but by comparing the SE of estimate across
 regressions of the same X and Y variables from different environments,
 couldn't one assess the expected accuracy of resulting predictions
 across environments using data sets with different sample size?  I think
 this is what Sarah is looking for...


Well - (again, not having thought about this much!)  if I wanted to assess
the accuracy of predictions, I would take a look at the prediction bands
of the regression lines.  But, all of these things (SE estimate, prediction
intervals, R^2, etc.) are all related measures of the accuracy of a
regression.


 I suspect that what Zar is referring to here
  is that the standard error of the estimate is in the same units as the
  dependent variable.  Hence, you can divide it by the mean to get a
  unitless measure.
 
 If your suspicion is true, why would Zar have continued on to say ...
 making the examination of [the SE of estimate] a poor method for
 comparing regressions (page 335, fourth edition).  Why would a unit-ed
 (i.e. non-unitless) measure automatically be poor for comparing
 regressions?  The continuation of the statement would make a lot more
 sense to me if Zar really were talking about instances where SE of
 estimate were proportional to the magnitude of the dependent variable.

I read Zar's comment (a unitless measure) (p335) as a reminder that
 you would want to correct for any effect of the magnitude of Y by
 dividing the SE of estimate (not residual variance) by the mean to avoid
 mixed units...

 Also, what would be the point of dividing by the mean Y if not to remove
 an effect of increasing magnitude of Y?  Is there another compelling
 reason to do this?


Well - the only reason I can think of is to avoid mixed units - as you
pointed out.  It's the same basic principle of using a coefficient of
variation.  Perhaps a better characterization of the relationship between
the SE of the estimate and the magnitude of Y is that, the SE of the
estimate TENDS to be proportinal to the magnitude of the dependent
variable.  That is - although it is not necessarily so (as in adding a
constant to all values), observations with a larger mean tend to have a
larger variance than observations with a smaller mean, as in your example of
weights.



I'd appreciate your thoughts...

 Thanks,

 -Geoff



Re: standard deviation of a slope

2006-08-16 Thread Anon.
Sarah Gilman wrote:
 Is it possible to calculate the standard deviation of the slope of a  
 regression line and does anyone know how?  My best guess after  
 reading several stats books is that the standard deviation and the  
 standard error of the slope are different names for the same thing.
 
Technically, the standard error is the standard deviation of the 
sampling distribution of a statistic, so it is the same as the standard 
deviation.  So, you're right.

 The context of this question is  a manuscript comparing the  
 usefulness of regression to estimate the slope of a relationship  
 under different environmental conditions.  A reviewer suggested  
 presenting the standard deviation of the slope rather than the  
 standard error to compare the precision of the regression under  
 different conditions.  For unrelated reasons, the sample sizes used  
 in the compared regressions vary  from 10 to 200.  The reviewer  
 argues that the sample size differences are influencing the standard  
 error values, and so the standard deviation (which according to the  
 reviewer doesn't incorporate the sample size) would be a more robust  
 comparison of the precision of the slope estimate among these  
 different regressions.
 
Well of course the sample sizes differences are influencing the standard 
error values!  And so they should: if you have a larger sample size, 
then the estimates are more accurate.  Why would one want anything other 
than this to be the case?

In some cases, standard errors are calculated by dividing a standard 
deviation by sqrt(n), but these are only special cases.

It may be that the reviewer can provide further enlightenment, but from 
what you've written, I'm not convinced that they have the right idea.

Bob

-- 
Bob O'Hara

Dept. of Mathematics and Statistics
P.O. Box 68 (Gustaf H„llstr”min katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax:  +358-9-191 51400
WWW:  http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: http://www.jnr-eeb.org


Re: standard deviation of a slope

2006-08-16 Thread David Bryant
Bob,

I have a similar question to Sarah's and it may even be the same;   
I'm using orthogonal regression to determine the equivalence of two  
variables, both with errors.  I want to use the S.E. of the slope to  
compare to the optimum slope of one (equivalence among variable  
responses).  I contacted JMP (SAS institute) and they recommend the  
two-one-sided test (TOST)  which I understand as simply increasing  
the alpha to 0.10.  But this still gives a very large confidence  
interval providing a less than robust test.  In some instances a  
slope of 2 is not significantly different than slope of 1.  (!!??) In  
fact I have not found one instance in which the slopes differ.  This  
seems like a universal type II error to me.

Can I use the standard test of homogeneity of slopes used in ANCOVA  
and compare to 1  (s.e. =0)  or would that lead to a type I error?

Thanks for your time,

David

David M Bryant Ph D
University of New Hampshire
Environmental Education Program
Durham, NH 03824

[EMAIL PROTECTED]
978-356-1928



On Aug 16, 2006, at 9:39 AM, Anon. wrote:

 Sarah Gilman wrote:
 Is it possible to calculate the standard deviation of the slope of a
 regression line and does anyone know how?  My best guess after
 reading several stats books is that the standard deviation and the
 standard error of the slope are different names for the same thing.

 Technically, the standard error is the standard deviation of the
 sampling distribution of a statistic, so it is the same as the  
 standard
 deviation.  So, you're right.

 The context of this question is  a manuscript comparing the
 usefulness of regression to estimate the slope of a relationship
 under different environmental conditions.  A reviewer suggested
 presenting the standard deviation of the slope rather than the
 standard error to compare the precision of the regression under
 different conditions.  For unrelated reasons, the sample sizes used
 in the compared regressions vary  from 10 to 200.  The reviewer
 argues that the sample size differences are influencing the standard
 error values, and so the standard deviation (which according to the
 reviewer doesn't incorporate the sample size) would be a more robust
 comparison of the precision of the slope estimate among these
 different regressions.

 Well of course the sample sizes differences are influencing the  
 standard
 error values!  And so they should: if you have a larger sample size,
 then the estimates are more accurate.  Why would one want anything  
 other
 than this to be the case?

 In some cases, standard errors are calculated by dividing a standard
 deviation by sqrt(n), but these are only special cases.

 It may be that the reviewer can provide further enlightenment, but  
 from
 what you've written, I'm not convinced that they have the right idea.

 Bob

 -- 
 Bob O'Hara

 Dept. of Mathematics and Statistics
 P.O. Box 68 (Gustaf H„llstr”min katu 2b)
 FIN-00014 University of Helsinki
 Finland

 Telephone: +358-9-191 51479
 Mobile: +358 50 599 0540
 Fax:  +358-9-191 51400
 WWW:  http://www.RNI.Helsinki.FI/~boh/
 Journal of Negative Results - EEB: http://www.jnr-eeb.org


Re: standard deviation of a slope

2006-08-16 Thread Geoffrey Poole
Sarah:

I think the reviewer comment has merit.

I understand your problem as follows:  Your goal is to compare the 
usefulness (not sure what you means by usefulness, but we'll go with 
it...) of a regressions across environmental conditions.  However, under 
one set of environmental conditions the regression might be based on 10 
points, but under another set of conditions, it might be based on 100 
points.

Unfortunately, even under the SAME environmental conditions, the SE of 
the slope will decrease as the sample size increases.  Thus, if the 
number of points varies across environmental conditions, you don't know 
if changes in the SE of the slope are caused by differences in sample 
size or differences in usefulness across conditions.

In section 17.3 Testing the significance of a regression of Zar's 
Biostatistical Analysis (pages 334-5 of forth edition) there is a clue 
that might help you with your dilemma...

Zar notes that the standard error of estimate (AKA standard error of 
the regression) is a measure of the remaining variance in Y *after* 
taking into account the dependence of Y on X.  However, since the 
magnitude of this value is proportional to the magnitude of the 
dependent variable, Y, examination of [this statistic is] a poor method 
for comparing regressions.  Thus, Dapson (1980) recommends using [the 
standard error of estimate divided by the mean of Y] (a unitless 
measure) to judge regression fits.

As I understand things (and I caution you that this could be wrong), the 
standard error of estimate (i.e., the variance of Y after taking into 
account the dependence of Y on X) should be independent of the number of 
points in the regression.  Therefore, is seems a good candidate for your 
comparisons.  An issue arises, however, if the mean of your Y values is 
different across environmental conditions.  In this case, you may have 
to normalize your standard error of the estimate by dividing by the 
mean of Y for each regression, as suggested by Dapson (1980) (as cited 
in Zar 1999).

Zar, J. H. 1999. Biostatistical analysis, fourth edition. Prentice Hall, 
Upper Saddle River, New Jersey.

Dapson, R. W. 1980.  Guidelines for statistical usage in age-estimation 
techniques. J. Wildlife Manage. 44:541-548 (as cited in Zar 1999)

I'm not sure this is the solution because I'm not a statistician and I 
haven't read Dapson's paper, but I'm pretty sure the reviewer has a 
legitimate point about your comparisons (as I described above) and I 
hope these references will help you find your answer.

I'm posting this to the general list serve in hopes that others out 
there will correct, improve upon, or confirm my thoughts.

-Geoff Poole



Anon. wrote:
 Sarah Gilman wrote:
 
 Is it possible to calculate the standard deviation of the slope of a  
 regression line and does anyone know how?  My best guess after  
 reading several stats books is that the standard deviation and the  
 standard error of the slope are different names for the same thing.

 Technically, the standard error is the standard deviation of the 
 sampling distribution of a statistic, so it is the same as the standard 
 deviation.  So, you're right.
 
 The context of this question is  a manuscript comparing the  
 usefulness of regression to estimate the slope of a relationship  
 under different environmental conditions.  A reviewer suggested  
 presenting the standard deviation of the slope rather than the  
 standard error to compare the precision of the regression under  
 different conditions.  For unrelated reasons, the sample sizes used  
 in the compared regressions vary  from 10 to 200.  The reviewer  
 argues that the sample size differences are influencing the standard  
 error values, and so the standard deviation (which according to the  
 reviewer doesn't incorporate the sample size) would be a more robust  
 comparison of the precision of the slope estimate among these  
 different regressions.

 Well of course the sample sizes differences are influencing the standard 
 error values!  And so they should: if you have a larger sample size, 
 then the estimates are more accurate.  Why would one want anything other 
 than this to be the case?
 
 In some cases, standard errors are calculated by dividing a standard 
 deviation by sqrt(n), but these are only special cases.
 
 It may be that the reviewer can provide further enlightenment, but from 
 what you've written, I'm not convinced that they have the right idea.
 
 Bob
 


Re: standard deviation of a slope

2006-08-16 Thread Malcolm McCallum
You can get teh Confidence interval and Prediction interval with most =
software.  I know MiniTab does it for regression, can't recall if SPSS =
does it, but probably does. =20
=20
VISIT HERPETOLOGICAL CONSERVATION AND BIOLOGY www.herpconbio.org =
http://www.herpconbio.org=20
A New Journal Published in Partnership with Partners in Amphibian and =
Reptile Conservation
and the World Congress of Herpetology.
=20
Malcolm L. McCallum
Assistant Professor
Department of Biological Sciences
Texas AM University Texarkana
2600 Robison Rd.
Texarkana, TX 75501
O: 1-903-223-3134
H: 1-903-791-3843
Homepage: https://www.eagle.tamut.edu/faculty/mmccallum/index.html
=20



From: Ecological Society of America: grants, jobs, news on behalf of =
Anon.
Sent: Wed 8/16/2006 8:39 AM
To: ECOLOG-L@LISTSERV.UMD.EDU
Subject: Re: standard deviation of a slope



Sarah Gilman wrote:
 Is it possible to calculate the standard deviation of the slope of a=20
 regression line and does anyone know how?  My best guess after=20
 reading several stats books is that the standard deviation and the=20
 standard error of the slope are different names for the same thing.

Technically, the standard error is the standard deviation of the
sampling distribution of a statistic, so it is the same as the standard
deviation.  So, you're right.

 The context of this question is  a manuscript comparing the=20
 usefulness of regression to estimate the slope of a relationship=20
 under different environmental conditions.  A reviewer suggested=20
 presenting the standard deviation of the slope rather than the=20
 standard error to compare the precision of the regression under=20
 different conditions.  For unrelated reasons, the sample sizes used=20
 in the compared regressions vary  from 10 to 200.  The reviewer=20
 argues that the sample size differences are influencing the standard=20
 error values, and so the standard deviation (which according to the=20
 reviewer doesn't incorporate the sample size) would be a more robust=20
 comparison of the precision of the slope estimate among these=20
 different regressions.

Well of course the sample sizes differences are influencing the standard
error values!  And so they should: if you have a larger sample size,
then the estimates are more accurate.  Why would one want anything other
than this to be the case?

In some cases, standard errors are calculated by dividing a standard
deviation by sqrt(n), but these are only special cases.

It may be that the reviewer can provide further enlightenment, but from
what you've written, I'm not convinced that they have the right idea.

Bob

--
Bob O'Hara

Dept. of Mathematics and Statistics
P.O. Box 68 (Gustaf Hllstrmin katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax:  +358-9-191 51400
WWW:  http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: http://www.jnr-eeb.org


Re: standard deviation of a slope

2006-08-16 Thread Anon.
Geoffrey Poole wrote:
 Sarah:

 I think the reviewer comment has merit.

 I understand your problem as follows:  Your goal is to compare the 
 usefulness (not sure what you means by usefulness, but we'll go with 
 it...) of a regressions across environmental conditions.  However, under 
 one set of environmental conditions the regression might be based on 10 
 points, but under another set of conditions, it might be based on 100 
 points.

 Unfortunately, even under the SAME environmental conditions, the SE of 
 the slope will decrease as the sample size increases.  Thus, if the 
 number of points varies across environmental conditions, you don't know 
 if changes in the SE of the slope are caused by differences in sample 
 size or differences in usefulness across conditions.

 In section 17.3 Testing the significance of a regression of Zar's 
 Biostatistical Analysis (pages 334-5 of forth edition) there is a clue 
 that might help you with your dilemma...

 Zar notes that the standard error of estimate (AKA standard error of 
 the regression) is a measure of the remaining variance in Y *after* 
 taking into account the dependence of Y on X.  
Zar says that?  That's rubbish: the residual variance is the measure of 
the remaining variance in Y after taking into account the dependence of 
Y on X.

 However, since the 
 magnitude of this value is proportional to the magnitude of the 
 dependent variable, 
Again, rubbish: add 20 000 to all of your Y's, and the variances will 
all be the same.  The only difference is that the estimated intercept is 
20 000 higher.

I might now have understood the original problem (possibly...).

I think the idea is that in any single environment, one can regress two 
variables and get a fit etc.  But the question is: how well will this 
fit do in another environment?  The (actual) slope will probably be 
different between environments, and the more different they are, the 
less use it is to use the slope in one environment to predict in 
another.  The problem is the variation between the slopes in the 
different environments: obviously we can measure this variation by the 
standard deviation (or the variance!).

In practice, I would suggest fitting a mixed model, where you allow the 
slope to vary randomly between environments.  Any decent stats package 
can do this: I think some people call them random regressions.  This 
will estimate the variation in slopes between environments, allowing for 
any differences in sample sizes in the different environments.  If the 
variance is small, then the predictions from one environment to another 
will be pretty good (obviously this depends a bit on the size of the 
regression coefficient: if it's zero, then there's no improvement anyway).

I'll have to think a bit more about the best way of evaluating the 
importance of the variation in the slopes: the intuition is to ask how 
much better you do at predicting the value of a data point if you know 
which environment it was measured in, as compared to if it's a random 
environment.  Something similar to an intraclass correlation could be used.

Incidentally, this is perhaps a good opportunity to plug this book:
http://www.stat.columbia.edu/~cook/movabletype/archives/2006/08/our_new_book_da.html
I read a draft in the spring and can heartily recommend it.  It covers 
the family of models that can be used for most statistical analyses I 
see in ecology (including the problem here!), in a practical way.

And now to bed.

Bob

-- 
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland

Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax:  +358-9-191 51400
WWW:  http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org


Re: standard deviation of a slope

2006-08-16 Thread Stephen B. Cox
On 8/16/06, David Bryant [EMAIL PROTECTED] wrote:

 Bob,

 I have a similar question to Sarah's and it may even be the same;
 I'm using orthogonal regression to determine the equivalence of two
 variables, both with errors.  I want to use the S.E. of the slope to
 compare to the optimum slope of one (equivalence among variable
 responses).  I contacted JMP (SAS institute) and they recommend the
 two-one-sided test (TOST)  which I understand as simply increasing
 the alpha to 0.10.  But this still gives a very large confidence
 interval providing a less than robust test.  In some instances a
 slope of 2 is not significantly different than slope of 1.  (!!??) In
 fact I have not found one instance in which the slopes differ.  This
 seems like a universal type II error to me.

 Can I use the standard test of homogeneity of slopes used in ANCOVA
 and compare to 1  (s.e. =3D0)  or would that lead to a type I error?


I would just look at the CI for your slope estimate and see if it included
1.



Thanks for your time,

 David

 David M Bryant Ph D
 University of New Hampshire
 Environmental Education Program
 Durham, NH 03824

 [EMAIL PROTECTED]
 978-356-1928



 On Aug 16, 2006, at 9:39 AM, Anon. wrote:

  Sarah Gilman wrote:
  Is it possible to calculate the standard deviation of the slope of a
  regression line and does anyone know how?  My best guess after
  reading several stats books is that the standard deviation and the
  standard error of the slope are different names for the same thing.
 
  Technically, the standard error is the standard deviation of the
  sampling distribution of a statistic, so it is the same as the
  standard
  deviation.  So, you're right.
 
  The context of this question is  a manuscript comparing the
  usefulness of regression to estimate the slope of a relationship
  under different environmental conditions.  A reviewer suggested
  presenting the standard deviation of the slope rather than the
  standard error to compare the precision of the regression under
  different conditions.  For unrelated reasons, the sample sizes used
  in the compared regressions vary  from 10 to 200.  The reviewer
  argues that the sample size differences are influencing the standard
  error values, and so the standard deviation (which according to the
  reviewer doesn't incorporate the sample size) would be a more robust
  comparison of the precision of the slope estimate among these
  different regressions.
 
  Well of course the sample sizes differences are influencing the
  standard
  error values!  And so they should: if you have a larger sample size,
  then the estimates are more accurate.  Why would one want anything
  other
  than this to be the case?
 
  In some cases, standard errors are calculated by dividing a standard
  deviation by sqrt(n), but these are only special cases.
 
  It may be that the reviewer can provide further enlightenment, but
  from
  what you've written, I'm not convinced that they have the right idea.
 
  Bob
 
  --
  Bob O'Hara
 
  Dept. of Mathematics and Statistics
  P.O. Box 68 (Gustaf H=84llstrmin katu 2b)
  FIN-00014 University of Helsinki
  Finland
 
  Telephone: +358-9-191 51479
  Mobile: +358 50 599 0540
  Fax:  +358-9-191 51400
  WWW:  http://www.RNI.Helsinki.FI/~boh/
  Journal of Negative Results - EEB: http://www.jnr-eeb.org



Re: standard deviation of a slope

2006-08-16 Thread Geoffrey Poole
Geoffrey Poole wrote:
  Zar notes that the standard error of estimate (AKA standard error
  of the regression) is a measure of the remaining variance in Y
  *after* taking into account the dependence of Y on X.

Bob O'Hara wrote:
  Zar says that?  That's rubbish: the residual variance is the measure
  of the remaining variance in Y after taking into account the
  dependence of Y on X.

The way I read Zar, he starts with the regression residual sum of 
squares and divides degrees of freedom, which yields the variance of the 
residuals (to which you refer).  If you take the square root of this 
value, you get what Zar refers to as the standard error of estimate.

I suppose I was not careful in my wording when I called this statistic a 
measure of variance.  I should have said a measure of variation.

Geoffrey Poole wrote:
  However, since the magnitude of this value is proportional to the
  magnitude of the dependent variable...

Bob O'Hara wrote:
  Again, rubbish: add 20 000 to all of your Y's, and the variances will
  all be the same.  The only difference is that the estimated intercept
  is 20 000 higher.

Yes, adding a constant to a distribution will not change the variance. 
In thinking about it, it does seem confusing for Zar to state: The 
magnitude of [the 'standard error of estimate'] is proportional to the 
magnitude of the dependent variable, Y. (top of page 335, Fourth 
Edition). But before we dismiss Zar (and Dapson) as rubbish, let's 
consider real-world data that represent biological phenomena rather than 
purely contrived data (e.g., adding a constant to all Y values).

Consider the weight of animals, for instance.  The variance in weight 
for a large-bodied species (say, humans) is much higher than for mice, 
and higher for mice than fleas.  Even within a single species (again, 
e.g., humans), the variance in weight among adults is far greater than 
among infants.  When considering regressions that predict the weight of 
individuals, then, it follows that the residuals of regressions are apt 
to increase in proportion to the average weight of individuals in the 
population.

Thus, couldn't biological factors (rather than any underlying 
mathematical formulation) drive a relationship between the standard 
error of estimate and the mean of the dependent variable?

-Geoff Poole