李俊杰 <klijunjie <at> gmail.com> writes: > > Hi, Lynch, > > Thank you for attention first. > > I am also not a statistician and have just taken several statistics classes. > So it is natral for us to ask some question seeming naive to statisticans. > > I am sorry that I cannot agree with your point that we must always include > intercept in our model. becaus if true intercept is zero, the strategy of > you or your textbook will be have 2 losses. First, there will be > explaination problem. If true intercept is zero and your estimate of it is > not zero, the result of regression is misleading. However, it might be not > so serious as we judge those coefficients which are actually zeros to be > none-zeros, but the misjudge here is still a loss in some > extent. Secondly, if true intercept is zero, your strategy's predictive > ability is often lower than other strategies which do not always include > intercept. > I'm not a statistician, but I've seen much damage done with regression forced through zero in my field (ecology). This technique is tought in many statistical textbooks popular among ecologists. The key problem here is: how do you *know* that the intercept is zero? Even in logically compelling cases it is very easy to reach false certainty of zero intercept. A typical case in ecology is where people study the number of species against biomass, and argue that there *must* be zero species when biomass = 0 (if there is nothing, then there is nothing). The conclusion is that you must fit a model with no intercept. Let's see a typical example (and I'm so confident that I won't put any random number seed for this):
mass <- runif(100, 10, 500) # typical range for plant biomass/m^2 spno <- rpois(100, 12) # Moderate number of species independent of mass summary(lm(spno ~ mass - 1)) # WRONG! summary(lm(spno ~ mass)) # More or less correct It is not sufficient to know that the value must be zero in a certain point, you also should know how that point is scaled: it may make sense to say that spno = 0 at log(mass) = -Inf, but then it does not make sense to force regression through that point. In particular, when the zero-point is extrapolated from the data, it is dangerous to force regression through the origin. Further, if your x does not have a really natural scale, but you can replace x with x - constant (like x - mean(x)), then it hardly makes sense to play with zero intercepts. There may be cases where forcing regression through zero makes sense, but they seem to be very rare. I've seen them very rarely. There is an exegetic text on the issue at http://www.stats.ox.ac.uk/pub/MASS3/Exegeses.pdf which also touches this issue (page 3) and makes a nice reading anyhow. Cheers, Jari Oksanen ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.