*** For details on how to be removed from this list visit the *** *** CCP4 home page http://www.ccp4.ac.uk ***
On 8/30/05 5:57 PM, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> a écrit: >>> I think many people would disagree, arguing that LS does represent a choice >>> of error distribution when it is not otherwise known. In fact, LS makes >>> several assumptions about error (errors are independent, have the same >>> variance, expectation is zero..., see the wikipedia page from the original >>> message>) Just because we do not actively choose an error distribution does >>> not mean that one is not chosen. When we use LS, and claim a "best fit" of >>> the data, we are making the assumption that the errors are normal. >>> >> This is incorrect. The statistical justification for LS (Gauss-Markov >> theorem) assumes nothing about the form of the error distribution aside from >> (1) zero expectation, (2) noncorrelation (*not* independence), and (3) equal >> variance. In fact, weighted least squares can correct exactly for violations >> of (2) and (3). So, WLS only assumes (1). The error distribution can >> certainly be non-normal and the optimal properties guaranteed by the G-M >> theorem will still hold. >> > Yes, but given an expectation value and a variance, Maximum Entropy methods > say that the only unbiased error distribution is Gaussian. That is really a different (and for the purposes here) unrelated point. You are also using a non-traditional definition of 'statistical bias'. The Gauss-Markov theorem guarantees that if condition (1) above holds [even if (2) and (3) are violated], then the least-squares solution is unbiased. Here I am using the standard def of 'bias' in statistics: http://mathworld.wolfram.com/EstimatorBias.html http://en.wikipedia.org/wiki/Bias_%28statistics%29#The_sometimes-good_kind What this means is that, if the first G-M condition holds, then least squares gives the parameter that is, in the long run average (even with small samples), equal to the true parameter. If (2) and (3) hold then the LS solution additionally has the smallest variance of any possible other unbiased estimator of the parameter. > So, although it's never explicitly stated in Least-Squares, the fact that you > consider only an expectation and a variance (and the lack of correlation), in > a sense, actually already nails down the error model (if you want to remain > unbiased, that is). No, aside from (1), (2), and (3), the G-M guarantees are independent of the form of the error distribution. For instance, the exact same properties hold when LS is applied to data with a Laplacian distribution, or a logistic distribution, or a uniform distribution, or an extreme value distribution, or a Gaussian distribution. It doesn't matter -- the LS solution is statistically unbiased (in the sense given above) as long as (1) holds, and it is minimum variance unbiased if (2) and (3) hold. To get back to the principle of Maximum Entropy: it simply says that, if all you know about the distribution of some value are the first two moments (the mean and the variance), then you are best off, in terms of information theory, by assuming that the distribution is Gaussian. I don't see how this directly applies to LS, other than giving you confidence that the Gaussian is frequently a good assumption, and that in that case LS will be equivalent to maximum likelihood with homoscedastic data. Douglas > If the errors were non-Normal then you can still apply the method and get good > results, but I'm not sure that they are optimal, they don't have the highest > likelihood and are certainly not the most probable.
