On Tue, May 29, 2018 at 9:14 AM, Jonathan Tammo Siebert < jotasi_numpy_sc...@posteo.de> wrote:
> Hi, > > I hope this is the appropriate place to ask something like > this, otherwise please let me know (or feel free to ignore > this). Also I hope that I do not misunderstood something or > did some silly mistake. If so, please let me know as well! > > TLDR: > When scaling the covariance matrix based on the residuals, > scipy.optimize.curve_fit uses a factor of chisq(popt)/(M-N) > (with M=number of point, N=number of parameters) and > numpy.polyfit uses chisq(popt)/(M-N-2). I am wondering which > is correct. > > I am somewhat confused about different results I am getting > for the covariance matrix of a simple linear fit, when > comparing `scipy.optimize.curve_fit` and `numpy.polyfit`. I > am aware, that `curve_fit` solves the more general non-linear > problem numerically, while `polyfit` finds an analytical > solution to the linear problem. However, both converge to the > same solution, so I suspect that this difference is not > important here. The difference, I am curious about is not in > the returned parameters but in the estimate of the > corresponding covariance matrix. As I understand, there are > two different ways to estimate it, based either on the > absolute values of the provided uncertainties or by > interpreting those only as weights and then scaling the > matrix to produce an appropriate reduced chisq. To that end, > curve_fit has the parameter: > https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.cur > ve_fit.html: > "absolute_sigma : bool, optional > If True, sigma is used in an absolute sense and the > estimated parameter covariance pcov reflects these absolute > values. > If False, only the relative magnitudes of the sigma values > matter. The returned parameter covariance matrix pcov is > based on scaling sigma by a constant factor. This constant > is set by demanding that the reduced chisq for the optimal > parameters popt when using the scaled sigma equals unity. In > other words, sigma is scaled to match the sample variance of > the residuals after the fit. Mathematically, > pcov(absolute_sigma=False) = pcov(absolute_sigma=True) * > chisq(popt)/(M-N)" > https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html > on the other hand, does not say anything about how the > covariance matrix is estimated. To my understanding, its > default should correspond to `absolute_sigma=False` for > `curve_fit`. As `polyfit` has a weight parameter instead of > an uncertainty parameter, I guess the difference in default > behavior is not that surprising. > However, even when specifying `absolute_sigma=False`, > `curve_fit` and `polyfit` produce different covariance > matrices as the applied scaling factors are > chisq(popt)/(M-N-2) for `polyfit` > (https://github.com/numpy/numpy/blob/6a58e25703cbecb6786faa09a04ae2ec82 > 21348b/numpy/lib/polynomial.py#L598-L605) > and chisq(popt)/(M-N) for `curve_fit` > (https://github.com/scipy/scipy/blob/607a21e07dad234f8e63fcf03b7994137a > 3ccd5b/scipy/optimize/minpack.py#L781-L782). > The argument given in a comment to the scaling `polyfit` is: > "Some literature ignores the extra -2.0 factor in the > denominator, but it is included here because the covariance > of Multivariate Student-T (which is implied by a Bayesian > uncertainty analysis) includes it. Plus, it gives a slightly > more conservative estimate of uncertainty.", > but honestly, in a quick search, I was not able to find any > literature not ignoring the extra "factor". But obviously, I > could very well be misunderstanding something. > Nonetheless, as `curve_fit` ignores it as well, I was > wondering whether those two shouldn't give consistent > results and if so, which would be the correct solution. > I've never seen the -2 in any literature, and there is no reference in the code comment. (I would remove it as a bug-fix. Even if there is some Bayesian interpretation, it is not what users would expect.) There was a similar thread in 2013 https://mail.scipy.org/pipermail/numpy-discussion/2013-February/065664.html Josef > > > Best, > > Jonathan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion