On Wed, Feb 27, 2013 at 6:46 AM, David Pine <djp...@gmail.com> wrote:
> As of NumPy v1.7, numpy.polyfit includes an option for providing weighting > to data to be fit. It's a welcome addition, but the implementation seems a > bit non-standard, perhaps even wrong, and I wonder if someone can enlighten > me. > > 1. The documentation does not specify what the weighting array "w" is > supposed to be. After some fooling around I figured out that it is > 1/sigma, where sigma is the standard deviation uncertainty "sigma" in the y > data. A better implementation, which would be consistent with how > weighting is done in scipy.optimize.curve_fit, would be to specify "sigma" > directly, rather than "w". This is typically what the user has at hand, > this syntax is more transparent, and it would make polyfit consistent with > the nonlinear fitting routine scipy.optimize.curve_fit. > Weights can be used for more things than sigma. Another common use is to set the weight to zero for bad data points. Zero is a nicer number than inf, although that would probably work for ieee numbers. Inf and 0/inf == 0 are modern innovations, so multiplication weights are somewhat traditional. Weights of zero do need to be taken into account in counting the degrees of freedom however. > > 2. The definition of the covariance matrix in polyfit is peculiar (or > maybe wrong, I'm not sure). Towards the end of the code for polyfit, the > standard covariance matrix is calculated and given the variable name > "Vbase". However, instead of returning Vbase as the covariance matrix, > polyfit returns the quantity Vbase * fac, where fac = resids / (len(x) - > order - 2.0). fac scales as N, the number of data points, so the > covariance matrix is about N times larger than it should be for large > values of N; the increase is smaller for small N (of order 10). Perhaps > the authors of polyfit want to use a more conservative error estimate, but > the references given in the poyfit code make no mention of it. I think it > would be much better to return the standard definition of the covariance > matrix (see, for example, the book Numerical Recipes). > The resids scale as N, so fac is approximately constant. The effect of more data points shows up in (A^T * A)^-1, which is probably what is called covariance, but really isn't until it is scaled by fac. Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion