Re: standard deviation of a slope
Hi Geoff - just have a quick minute.. so, I'll hazard a response without thinking about it too much :) On 8/16/06, Geoffrey Poole [EMAIL PROTECTED] wrote: Doesn't sqrt(SSx) increase with n? If so, won't the standard error of the slope decrease with increasing sample size?? Yes - the standard error of the slope will decrease with increasing sample size. I realize SE of estimate and SE of slope do not represent the same thing, statistically, but by comparing the SE of estimate across regressions of the same X and Y variables from different environments, couldn't one assess the expected accuracy of resulting predictions across environments using data sets with different sample size? I think this is what Sarah is looking for... Well - (again, not having thought about this much!) if I wanted to assess the accuracy of predictions, I would take a look at the prediction bands of the regression lines. But, all of these things (SE estimate, prediction intervals, R^2, etc.) are all related measures of the accuracy of a regression. I suspect that what Zar is referring to here is that the standard error of the estimate is in the same units as the dependent variable. Hence, you can divide it by the mean to get a unitless measure. If your suspicion is true, why would Zar have continued on to say ... making the examination of [the SE of estimate] a poor method for comparing regressions (page 335, fourth edition). Why would a unit-ed (i.e. non-unitless) measure automatically be poor for comparing regressions? The continuation of the statement would make a lot more sense to me if Zar really were talking about instances where SE of estimate were proportional to the magnitude of the dependent variable. I read Zar's comment (a unitless measure) (p335) as a reminder that you would want to correct for any effect of the magnitude of Y by dividing the SE of estimate (not residual variance) by the mean to avoid mixed units... Also, what would be the point of dividing by the mean Y if not to remove an effect of increasing magnitude of Y? Is there another compelling reason to do this? Well - the only reason I can think of is to avoid mixed units - as you pointed out. It's the same basic principle of using a coefficient of variation. Perhaps a better characterization of the relationship between the SE of the estimate and the magnitude of Y is that, the SE of the estimate TENDS to be proportinal to the magnitude of the dependent variable. That is - although it is not necessarily so (as in adding a constant to all values), observations with a larger mean tend to have a larger variance than observations with a smaller mean, as in your example of weights. I'd appreciate your thoughts... Thanks, -Geoff
Re: standard deviation of a slope
Sarah Gilman wrote: Is it possible to calculate the standard deviation of the slope of a regression line and does anyone know how? My best guess after reading several stats books is that the standard deviation and the standard error of the slope are different names for the same thing. Technically, the standard error is the standard deviation of the sampling distribution of a statistic, so it is the same as the standard deviation. So, you're right. The context of this question is a manuscript comparing the usefulness of regression to estimate the slope of a relationship under different environmental conditions. A reviewer suggested presenting the standard deviation of the slope rather than the standard error to compare the precision of the regression under different conditions. For unrelated reasons, the sample sizes used in the compared regressions vary from 10 to 200. The reviewer argues that the sample size differences are influencing the standard error values, and so the standard deviation (which according to the reviewer doesn't incorporate the sample size) would be a more robust comparison of the precision of the slope estimate among these different regressions. Well of course the sample sizes differences are influencing the standard error values! And so they should: if you have a larger sample size, then the estimates are more accurate. Why would one want anything other than this to be the case? In some cases, standard errors are calculated by dividing a standard deviation by sqrt(n), but these are only special cases. It may be that the reviewer can provide further enlightenment, but from what you've written, I'm not convinced that they have the right idea. Bob -- Bob O'Hara Dept. of Mathematics and Statistics P.O. Box 68 (Gustaf Hllstrmin katu 2b) FIN-00014 University of Helsinki Finland Telephone: +358-9-191 51479 Mobile: +358 50 599 0540 Fax: +358-9-191 51400 WWW: http://www.RNI.Helsinki.FI/~boh/ Journal of Negative Results - EEB: http://www.jnr-eeb.org
Re: standard deviation of a slope
Bob, I have a similar question to Sarah's and it may even be the same; I'm using orthogonal regression to determine the equivalence of two variables, both with errors. I want to use the S.E. of the slope to compare to the optimum slope of one (equivalence among variable responses). I contacted JMP (SAS institute) and they recommend the two-one-sided test (TOST) which I understand as simply increasing the alpha to 0.10. But this still gives a very large confidence interval providing a less than robust test. In some instances a slope of 2 is not significantly different than slope of 1. (!!??) In fact I have not found one instance in which the slopes differ. This seems like a universal type II error to me. Can I use the standard test of homogeneity of slopes used in ANCOVA and compare to 1 (s.e. =0) or would that lead to a type I error? Thanks for your time, David David M Bryant Ph D University of New Hampshire Environmental Education Program Durham, NH 03824 [EMAIL PROTECTED] 978-356-1928 On Aug 16, 2006, at 9:39 AM, Anon. wrote: Sarah Gilman wrote: Is it possible to calculate the standard deviation of the slope of a regression line and does anyone know how? My best guess after reading several stats books is that the standard deviation and the standard error of the slope are different names for the same thing. Technically, the standard error is the standard deviation of the sampling distribution of a statistic, so it is the same as the standard deviation. So, you're right. The context of this question is a manuscript comparing the usefulness of regression to estimate the slope of a relationship under different environmental conditions. A reviewer suggested presenting the standard deviation of the slope rather than the standard error to compare the precision of the regression under different conditions. For unrelated reasons, the sample sizes used in the compared regressions vary from 10 to 200. The reviewer argues that the sample size differences are influencing the standard error values, and so the standard deviation (which according to the reviewer doesn't incorporate the sample size) would be a more robust comparison of the precision of the slope estimate among these different regressions. Well of course the sample sizes differences are influencing the standard error values! And so they should: if you have a larger sample size, then the estimates are more accurate. Why would one want anything other than this to be the case? In some cases, standard errors are calculated by dividing a standard deviation by sqrt(n), but these are only special cases. It may be that the reviewer can provide further enlightenment, but from what you've written, I'm not convinced that they have the right idea. Bob -- Bob O'Hara Dept. of Mathematics and Statistics P.O. Box 68 (Gustaf Hllstrmin katu 2b) FIN-00014 University of Helsinki Finland Telephone: +358-9-191 51479 Mobile: +358 50 599 0540 Fax: +358-9-191 51400 WWW: http://www.RNI.Helsinki.FI/~boh/ Journal of Negative Results - EEB: http://www.jnr-eeb.org
Re: standard deviation of a slope
Sarah: I think the reviewer comment has merit. I understand your problem as follows: Your goal is to compare the usefulness (not sure what you means by usefulness, but we'll go with it...) of a regressions across environmental conditions. However, under one set of environmental conditions the regression might be based on 10 points, but under another set of conditions, it might be based on 100 points. Unfortunately, even under the SAME environmental conditions, the SE of the slope will decrease as the sample size increases. Thus, if the number of points varies across environmental conditions, you don't know if changes in the SE of the slope are caused by differences in sample size or differences in usefulness across conditions. In section 17.3 Testing the significance of a regression of Zar's Biostatistical Analysis (pages 334-5 of forth edition) there is a clue that might help you with your dilemma... Zar notes that the standard error of estimate (AKA standard error of the regression) is a measure of the remaining variance in Y *after* taking into account the dependence of Y on X. However, since the magnitude of this value is proportional to the magnitude of the dependent variable, Y, examination of [this statistic is] a poor method for comparing regressions. Thus, Dapson (1980) recommends using [the standard error of estimate divided by the mean of Y] (a unitless measure) to judge regression fits. As I understand things (and I caution you that this could be wrong), the standard error of estimate (i.e., the variance of Y after taking into account the dependence of Y on X) should be independent of the number of points in the regression. Therefore, is seems a good candidate for your comparisons. An issue arises, however, if the mean of your Y values is different across environmental conditions. In this case, you may have to normalize your standard error of the estimate by dividing by the mean of Y for each regression, as suggested by Dapson (1980) (as cited in Zar 1999). Zar, J. H. 1999. Biostatistical analysis, fourth edition. Prentice Hall, Upper Saddle River, New Jersey. Dapson, R. W. 1980. Guidelines for statistical usage in age-estimation techniques. J. Wildlife Manage. 44:541-548 (as cited in Zar 1999) I'm not sure this is the solution because I'm not a statistician and I haven't read Dapson's paper, but I'm pretty sure the reviewer has a legitimate point about your comparisons (as I described above) and I hope these references will help you find your answer. I'm posting this to the general list serve in hopes that others out there will correct, improve upon, or confirm my thoughts. -Geoff Poole Anon. wrote: Sarah Gilman wrote: Is it possible to calculate the standard deviation of the slope of a regression line and does anyone know how? My best guess after reading several stats books is that the standard deviation and the standard error of the slope are different names for the same thing. Technically, the standard error is the standard deviation of the sampling distribution of a statistic, so it is the same as the standard deviation. So, you're right. The context of this question is a manuscript comparing the usefulness of regression to estimate the slope of a relationship under different environmental conditions. A reviewer suggested presenting the standard deviation of the slope rather than the standard error to compare the precision of the regression under different conditions. For unrelated reasons, the sample sizes used in the compared regressions vary from 10 to 200. The reviewer argues that the sample size differences are influencing the standard error values, and so the standard deviation (which according to the reviewer doesn't incorporate the sample size) would be a more robust comparison of the precision of the slope estimate among these different regressions. Well of course the sample sizes differences are influencing the standard error values! And so they should: if you have a larger sample size, then the estimates are more accurate. Why would one want anything other than this to be the case? In some cases, standard errors are calculated by dividing a standard deviation by sqrt(n), but these are only special cases. It may be that the reviewer can provide further enlightenment, but from what you've written, I'm not convinced that they have the right idea. Bob
Re: standard deviation of a slope
You can get teh Confidence interval and Prediction interval with most = software. I know MiniTab does it for regression, can't recall if SPSS = does it, but probably does. =20 =20 VISIT HERPETOLOGICAL CONSERVATION AND BIOLOGY www.herpconbio.org = http://www.herpconbio.org=20 A New Journal Published in Partnership with Partners in Amphibian and = Reptile Conservation and the World Congress of Herpetology. =20 Malcolm L. McCallum Assistant Professor Department of Biological Sciences Texas AM University Texarkana 2600 Robison Rd. Texarkana, TX 75501 O: 1-903-223-3134 H: 1-903-791-3843 Homepage: https://www.eagle.tamut.edu/faculty/mmccallum/index.html =20 From: Ecological Society of America: grants, jobs, news on behalf of = Anon. Sent: Wed 8/16/2006 8:39 AM To: ECOLOG-L@LISTSERV.UMD.EDU Subject: Re: standard deviation of a slope Sarah Gilman wrote: Is it possible to calculate the standard deviation of the slope of a=20 regression line and does anyone know how? My best guess after=20 reading several stats books is that the standard deviation and the=20 standard error of the slope are different names for the same thing. Technically, the standard error is the standard deviation of the sampling distribution of a statistic, so it is the same as the standard deviation. So, you're right. The context of this question is a manuscript comparing the=20 usefulness of regression to estimate the slope of a relationship=20 under different environmental conditions. A reviewer suggested=20 presenting the standard deviation of the slope rather than the=20 standard error to compare the precision of the regression under=20 different conditions. For unrelated reasons, the sample sizes used=20 in the compared regressions vary from 10 to 200. The reviewer=20 argues that the sample size differences are influencing the standard=20 error values, and so the standard deviation (which according to the=20 reviewer doesn't incorporate the sample size) would be a more robust=20 comparison of the precision of the slope estimate among these=20 different regressions. Well of course the sample sizes differences are influencing the standard error values! And so they should: if you have a larger sample size, then the estimates are more accurate. Why would one want anything other than this to be the case? In some cases, standard errors are calculated by dividing a standard deviation by sqrt(n), but these are only special cases. It may be that the reviewer can provide further enlightenment, but from what you've written, I'm not convinced that they have the right idea. Bob -- Bob O'Hara Dept. of Mathematics and Statistics P.O. Box 68 (Gustaf Hllstrmin katu 2b) FIN-00014 University of Helsinki Finland Telephone: +358-9-191 51479 Mobile: +358 50 599 0540 Fax: +358-9-191 51400 WWW: http://www.RNI.Helsinki.FI/~boh/ Journal of Negative Results - EEB: http://www.jnr-eeb.org
Re: standard deviation of a slope
Geoffrey Poole wrote: Sarah: I think the reviewer comment has merit. I understand your problem as follows: Your goal is to compare the usefulness (not sure what you means by usefulness, but we'll go with it...) of a regressions across environmental conditions. However, under one set of environmental conditions the regression might be based on 10 points, but under another set of conditions, it might be based on 100 points. Unfortunately, even under the SAME environmental conditions, the SE of the slope will decrease as the sample size increases. Thus, if the number of points varies across environmental conditions, you don't know if changes in the SE of the slope are caused by differences in sample size or differences in usefulness across conditions. In section 17.3 Testing the significance of a regression of Zar's Biostatistical Analysis (pages 334-5 of forth edition) there is a clue that might help you with your dilemma... Zar notes that the standard error of estimate (AKA standard error of the regression) is a measure of the remaining variance in Y *after* taking into account the dependence of Y on X. Zar says that? That's rubbish: the residual variance is the measure of the remaining variance in Y after taking into account the dependence of Y on X. However, since the magnitude of this value is proportional to the magnitude of the dependent variable, Again, rubbish: add 20 000 to all of your Y's, and the variances will all be the same. The only difference is that the estimated intercept is 20 000 higher. I might now have understood the original problem (possibly...). I think the idea is that in any single environment, one can regress two variables and get a fit etc. But the question is: how well will this fit do in another environment? The (actual) slope will probably be different between environments, and the more different they are, the less use it is to use the slope in one environment to predict in another. The problem is the variation between the slopes in the different environments: obviously we can measure this variation by the standard deviation (or the variance!). In practice, I would suggest fitting a mixed model, where you allow the slope to vary randomly between environments. Any decent stats package can do this: I think some people call them random regressions. This will estimate the variation in slopes between environments, allowing for any differences in sample sizes in the different environments. If the variance is small, then the predictions from one environment to another will be pretty good (obviously this depends a bit on the size of the regression coefficient: if it's zero, then there's no improvement anyway). I'll have to think a bit more about the best way of evaluating the importance of the variation in the slopes: the intuition is to ask how much better you do at predicting the value of a data point if you know which environment it was measured in, as compared to if it's a random environment. Something similar to an intraclass correlation could be used. Incidentally, this is perhaps a good opportunity to plug this book: http://www.stat.columbia.edu/~cook/movabletype/archives/2006/08/our_new_book_da.html I read a draft in the spring and can heartily recommend it. It covers the family of models that can be used for most statistical analyses I see in ecology (including the problem here!), in a practical way. And now to bed. Bob -- Bob O'Hara Department of Mathematics and Statistics P.O. Box 68 (Gustaf Hällströmin katu 2b) FIN-00014 University of Helsinki Finland Telephone: +358-9-191 51479 Mobile: +358 50 599 0540 Fax: +358-9-191 51400 WWW: http://www.RNI.Helsinki.FI/~boh/ Journal of Negative Results - EEB: www.jnr-eeb.org
Re: standard deviation of a slope
On 8/16/06, David Bryant [EMAIL PROTECTED] wrote: Bob, I have a similar question to Sarah's and it may even be the same; I'm using orthogonal regression to determine the equivalence of two variables, both with errors. I want to use the S.E. of the slope to compare to the optimum slope of one (equivalence among variable responses). I contacted JMP (SAS institute) and they recommend the two-one-sided test (TOST) which I understand as simply increasing the alpha to 0.10. But this still gives a very large confidence interval providing a less than robust test. In some instances a slope of 2 is not significantly different than slope of 1. (!!??) In fact I have not found one instance in which the slopes differ. This seems like a universal type II error to me. Can I use the standard test of homogeneity of slopes used in ANCOVA and compare to 1 (s.e. =3D0) or would that lead to a type I error? I would just look at the CI for your slope estimate and see if it included 1. Thanks for your time, David David M Bryant Ph D University of New Hampshire Environmental Education Program Durham, NH 03824 [EMAIL PROTECTED] 978-356-1928 On Aug 16, 2006, at 9:39 AM, Anon. wrote: Sarah Gilman wrote: Is it possible to calculate the standard deviation of the slope of a regression line and does anyone know how? My best guess after reading several stats books is that the standard deviation and the standard error of the slope are different names for the same thing. Technically, the standard error is the standard deviation of the sampling distribution of a statistic, so it is the same as the standard deviation. So, you're right. The context of this question is a manuscript comparing the usefulness of regression to estimate the slope of a relationship under different environmental conditions. A reviewer suggested presenting the standard deviation of the slope rather than the standard error to compare the precision of the regression under different conditions. For unrelated reasons, the sample sizes used in the compared regressions vary from 10 to 200. The reviewer argues that the sample size differences are influencing the standard error values, and so the standard deviation (which according to the reviewer doesn't incorporate the sample size) would be a more robust comparison of the precision of the slope estimate among these different regressions. Well of course the sample sizes differences are influencing the standard error values! And so they should: if you have a larger sample size, then the estimates are more accurate. Why would one want anything other than this to be the case? In some cases, standard errors are calculated by dividing a standard deviation by sqrt(n), but these are only special cases. It may be that the reviewer can provide further enlightenment, but from what you've written, I'm not convinced that they have the right idea. Bob -- Bob O'Hara Dept. of Mathematics and Statistics P.O. Box 68 (Gustaf H=84llstrmin katu 2b) FIN-00014 University of Helsinki Finland Telephone: +358-9-191 51479 Mobile: +358 50 599 0540 Fax: +358-9-191 51400 WWW: http://www.RNI.Helsinki.FI/~boh/ Journal of Negative Results - EEB: http://www.jnr-eeb.org
Re: standard deviation of a slope
Geoffrey Poole wrote: Zar notes that the standard error of estimate (AKA standard error of the regression) is a measure of the remaining variance in Y *after* taking into account the dependence of Y on X. Bob O'Hara wrote: Zar says that? That's rubbish: the residual variance is the measure of the remaining variance in Y after taking into account the dependence of Y on X. The way I read Zar, he starts with the regression residual sum of squares and divides degrees of freedom, which yields the variance of the residuals (to which you refer). If you take the square root of this value, you get what Zar refers to as the standard error of estimate. I suppose I was not careful in my wording when I called this statistic a measure of variance. I should have said a measure of variation. Geoffrey Poole wrote: However, since the magnitude of this value is proportional to the magnitude of the dependent variable... Bob O'Hara wrote: Again, rubbish: add 20 000 to all of your Y's, and the variances will all be the same. The only difference is that the estimated intercept is 20 000 higher. Yes, adding a constant to a distribution will not change the variance. In thinking about it, it does seem confusing for Zar to state: The magnitude of [the 'standard error of estimate'] is proportional to the magnitude of the dependent variable, Y. (top of page 335, Fourth Edition). But before we dismiss Zar (and Dapson) as rubbish, let's consider real-world data that represent biological phenomena rather than purely contrived data (e.g., adding a constant to all Y values). Consider the weight of animals, for instance. The variance in weight for a large-bodied species (say, humans) is much higher than for mice, and higher for mice than fleas. Even within a single species (again, e.g., humans), the variance in weight among adults is far greater than among infants. When considering regressions that predict the weight of individuals, then, it follows that the residuals of regressions are apt to increase in proportion to the average weight of individuals in the population. Thus, couldn't biological factors (rather than any underlying mathematical formulation) drive a relationship between the standard error of estimate and the mean of the dependent variable? -Geoff Poole