I have been using SAGE for teaching chemistry for some years and, when dealing with experimental data fitting, I found SAGE's "find_fit" function very valuable. Unfortunately, this function seems to provide no information about the quality of the fit, which can be as important as the fitting function itself. I found also that function "stats.linregress" from "stats" module in "scipy" provides some information, but it is restricted to models of type y = a*x+b. I have been looking around SAGE and python libraries and didn't found a satisfactory solution to this little problem. Therefore, I have tried my own solution which attempts to somehow mimic Mathematica's "LinearModelFit" function (not in full, because Mathematica's function is somehow redundant and verbose). I would like to know if you consider this solution sufficiently valuable to be included in SAGE's stuff. In my opinion, the strong points are that code is straightforward to follow, it is completely written using only python commands and SAGE functionality (it does not require to load python modules or anything else than SAGE core) and results are returned in a user friendly way, as a dictionary. The weak (but improvable without much effort) points are that this version works with functions depending on a single variable, and only for linear models.
Before submitting it, and following the advice in the FAQ, I would like to know the community's opinion about: (1) whether something like this already exists in SAGE or not, (if it already exists, I would be most grateful to know where can I find it) (2) if the answer to (1) is "not", can be the proposal interesting to SAGE community?, (3) if the answer to (2) is "yes", is the code that I propose suitable for this purpose? I put the code below, so that you may test it yourselves, and suggest improvements if you consider that it deserves your attention. Thanks in advance for your comments and suggestions (no matter whether positive or negative). Code: ------- def LinearModelFit(x, y, fns, t, confidencelevel=0.95, colorpoints='blue', colorfit='red', colorbands='green'): """\n Function for least-squares fit of a set of points (x_i,y_i) (i=1,...N) to a linear combination of basis functions: f_j(x), j = 1,... m (with m < N). \n Input: x, y, fns, t, confidencelevel (optional), colorpoints (optional), colorfit (optional), colorbands (optional) x: list with fitting points abscissas y: list with fitting points ordinates fns: list with basis functions for fitting (functions must depend on a single variable) t: name of variable in basis functions as supplied in fns confidencelevel: confidence level for error estimations; (default: 0.95 (95%)) colorpoints : color for fitting points (default: blue) colorfit : color for fitting function (default: red) colorbands : color for mean prediction bands (default: green) Output: a dictionary containing: 'best fit': best fit in terms of basis functions 'parameters': parameter estimates 'R2': coefficient of determination R^2 'Delta2': $\sum_i (y_i - f(x_i))^2$ 's2': Delta2 / (len(x) - len(fns)) 'fit response': predicted response $f(x_i)$ 'residuals': difference between actual and predicted responses 'cov': estimated covariance matrix 'standard errors' : standard errors for parameters 'parameter errors' : estimated errors of parameters for given confidence level 't-Statistic' : t-statistic for parameters estimation 'Pvalue' : P-values for t-statistic test 'upper band': function with the upper bound of mean prediction band 'lower band': function with the lower bound of mean prediction band 'bestfit plot' : plot of fitting function 'points plot' : plot of fitting points 'upper band plot' : plot of upper bound of mean prediction band 'lower band plot' : plot of lower bound of mean prediction band 'confidence level' : confidence level WARNING: This function is intended for linear parameters only, therefore, basis functions must not include further parameters or non-numeric symbols other than independent variable Author: Rafael Lopez. Universidad Autonoma de Madrid. e-mail: rafael.lo...@uam.es Date: 28 January 2015 """ # Validates data if len(x) != len(y): print "The number of abscissas (", len(x), ") must be equal to the number of ordinates (", len(y),")" return if len(fns) >= len(x): print "The number of fitting functions (", len(fns), ") must be lower than the number of points (", len(x),")" return if (confidencelevel < 0 or confidencelevel > 1): print "Confidence level (", confidencelevel, ") must be in the range [0,1]" return if not (sum(fns).substitute({t:((max(x)+min(x))*.5)}).is_real()): print "Fitting functions: ", fns print "Fitting functions must not contain other symbols than ", t return X = matrix([[f.substitute({t:u}) for f in fns] for u in x]) XX = transpose(X)*X if det(XX) == 0: print "Fitting functions: ", fns, " are linearly dependent. Remove dependencies." return # End of data validation XXi = XX^-1 alpha = 1-confidencelevel T = RealDistribution('t', len(x)-len(fns)) bvec = XXi*transpose(X)*transpose(matrix(y)) blist = list(transpose(bvec)[0]) response = list(transpose(X*bvec)[0]) residuals = list(vector(y) - vector(response)) R2 = R2=((matrix(y)*X*bvec-sum(y)^2/len(y))/(vector(y)*vector(y)-sum(y)^2/len(y)))[0][0] Delta2 = (vector(y)*vector(y) - matrix(y)*X*bvec)[0][0] s2 = Delta2 / (len(x)-len(fns)) cov = s2*XXi stderr = [sqrt(cov[i][i]) for i in range(len(list(cov)))] berr = list(T.cum_distribution_function_inv(1-alpha/2) * vector(stderr)) tstatistic = [blist[i]/stderr[i] for i in range(len(blist))] pvalue = [2*T.cum_distribution_function(-abs(i)) for i in tstatistic] fitfun(t) = (vector(fns)*bvec)[0] bandsup(t) = fitfun(t) + T.cum_distribution_function_inv(1-alpha/2) * sqrt(expand(vector(fns)*cov*vector(fns))) bandinf(t) = fitfun(t) - T.cum_distribution_function_inv(1-alpha/2) * sqrt(expand(vector(fns)*cov*vector(fns))) pntsfig = list_plot(zip(x,y),color=colorpoints) fajfig = plot(fitfun(t),(t,min(x),max(x)),color=colorfit) bndsupfig = plot(bandsup(t),(t,min(x),max(x)),color=colorbands) bndinffig = plot(bandinf(t),(t,min(x),max(x)),color=colorbands) results = {'best fit' : fitfun, 'parameters' : blist, 'R2' : R2, 's2' : s2, 'Delta2' : Delta2, \ 'fit response' : response, 'residuals' : residuals, 'cov' : cov, 'standard errors' : stderr, \ 'parameter errors' : berr, 't-Statistic' : tstatistic, 'Pvalue' : pvalue, 'upper band': bandsup, \ 'lower band' : bandinf, 'bestfit plot' : fajfig, 'points plot' : pntsfig, 'upper band plot' : bndsupfig, \ 'lower band plot' : bndinffig, 'confidence level' : confidencelevel} return results -- You received this message because you are subscribed to the Google Groups "sage-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To post to this group, send email to sage-devel@googlegroups.com. Visit this group at http://groups.google.com/group/sage-devel. For more options, visit https://groups.google.com/d/optout.