[sage-devel] LinearModelFit contribution

rafaellopezfernandez Wed, 28 Jan 2015 04:15:07 -0800

I have been using SAGE for teaching chemistry for some years and, when 
dealing with experimental data fitting, I found SAGE's "find_fit" function 
very valuable. Unfortunately, this function seems to provide no information 
about the quality of the fit, which can be as important as the fitting 
function itself. 
I found also that function "stats.linregress" from "stats" module in 
"scipy"  provides some information, but it is restricted to models of type 
y = a*x+b.
I have been looking around SAGE and python libraries and didn't  found a 
satisfactory solution to this little problem. Therefore, I have tried my 
own solution which attempts to somehow mimic Mathematica's "LinearModelFit" 
function (not in full, because Mathematica's function is somehow redundant 
and verbose). 
I would like to know if you consider this solution sufficiently valuable to 
be included in SAGE's stuff. 
In my opinion, the strong points are that code is straightforward to 
follow, it is completely written using only python commands and SAGE 
functionality (it does not require to load python modules or anything else 
than SAGE core) and results are returned in a user friendly way, as a 
dictionary.
The weak (but improvable without much effort) points are that this version 
works with functions depending on a single variable, and only for linear 
models.


Before submitting it, and following the advice in the FAQ, I would like to 
know the community's opinion about: 
(1) whether something like this already exists in SAGE or not, (if it 
already exists, I would be most grateful to know where can I find it)
(2) if the answer to (1) is "not", can be the proposal interesting to SAGE 
community?,
(3) if the answer to (2) is "yes", is the code that I propose suitable for 
this purpose?

I put the code below, so that you may test it yourselves, and suggest 
improvements if you consider that it deserves your attention.

Thanks in advance for your comments and suggestions (no matter whether 
positive or negative).

Code:
-------

def LinearModelFit(x, y, fns, t, confidencelevel=0.95, colorpoints='blue', 
colorfit='red', colorbands='green'):
    """\n
    Function for least-squares fit of a set of points (x_i,y_i) (i=1,...N) 
to a linear combination of basis functions: f_j(x), j = 1,... m (with m < 
N). \n
    Input: x, y, fns, t, confidencelevel (optional), colorpoints 
(optional), colorfit (optional), colorbands (optional)
        x: list with fitting points abscissas
        y: list with fitting points ordinates
        fns: list with basis functions for fitting (functions must depend 
on a single variable)
        t: name of variable in basis functions as supplied in fns
        confidencelevel: confidence level for error estimations; (default: 
0.95 (95%))
        colorpoints : color for fitting points (default: blue)
        colorfit : color for fitting function (default: red)
        colorbands : color for mean prediction bands (default: green)
    Output: a dictionary containing:
        'best fit': best fit in terms of basis functions
        'parameters': parameter estimates
        'R2': coefficient of determination R^2
        'Delta2': $\sum_i (y_i - f(x_i))^2$
        's2': Delta2 / (len(x) - len(fns))
        'fit response': predicted response $f(x_i)$  
        'residuals': difference between actual and predicted responses
        'cov': estimated covariance matrix
        'standard errors' : standard errors for parameters
        'parameter errors' : estimated errors of parameters for given 
confidence level
        't-Statistic' : t-statistic for parameters estimation
        'Pvalue' : P-values for t-statistic test
        'upper band': function with the upper bound of mean prediction band
        'lower band': function with the lower bound of mean prediction band
        'bestfit plot' : plot of fitting function
        'points plot' : plot of fitting points
        'upper band plot' : plot of upper bound of mean prediction band
        'lower band plot' : plot of lower bound of mean prediction band
        'confidence level' : confidence level 
    WARNING: This function is intended for linear parameters only, 
therefore, basis functions must not
        include further parameters or non-numeric symbols other than 
independent variable
    Author: Rafael Lopez. Universidad Autonoma de Madrid. e-mail: 
rafael.lo...@uam.es
    Date: 28 January 2015
    """
    # Validates data
    if len(x) != len(y):
        print "The number of abscissas (", len(x), ") must be equal to the 
number of ordinates (", len(y),")"
        return
    if len(fns) >= len(x):
        print "The number of fitting functions (", len(fns), ") must be 
lower than the number of points (", len(x),")"
        return
    if (confidencelevel < 0 or confidencelevel > 1):
        print "Confidence level (", confidencelevel, ") must be in the 
range [0,1]" 
        return
    if not (sum(fns).substitute({t:((max(x)+min(x))*.5)}).is_real()):
        print "Fitting functions: ", fns
        print "Fitting functions must not contain other symbols than ", t
        return
    X = matrix([[f.substitute({t:u}) for f in fns] for u in x])
    XX = transpose(X)*X
    if det(XX) == 0:
        print "Fitting functions: ", fns, " are linearly dependent. Remove 
dependencies."
        return
    # End of data validation
    XXi = XX^-1
    alpha = 1-confidencelevel
    T = RealDistribution('t', len(x)-len(fns))
    bvec = XXi*transpose(X)*transpose(matrix(y))
    blist = list(transpose(bvec)[0])
    response = list(transpose(X*bvec)[0])
    residuals = list(vector(y) - vector(response))
    R2 = 
R2=((matrix(y)*X*bvec-sum(y)^2/len(y))/(vector(y)*vector(y)-sum(y)^2/len(y)))[0][0]
    Delta2 = (vector(y)*vector(y) - matrix(y)*X*bvec)[0][0]
    s2 = Delta2 / (len(x)-len(fns))
    cov = s2*XXi
    stderr = [sqrt(cov[i][i]) for i in range(len(list(cov)))]
    berr = list(T.cum_distribution_function_inv(1-alpha/2) * vector(stderr))
    tstatistic = [blist[i]/stderr[i] for i in range(len(blist))]
    pvalue = [2*T.cum_distribution_function(-abs(i)) for i in tstatistic]
    fitfun(t) = (vector(fns)*bvec)[0]
    bandsup(t) = fitfun(t) + T.cum_distribution_function_inv(1-alpha/2) * 
sqrt(expand(vector(fns)*cov*vector(fns)))
    bandinf(t) = fitfun(t) - T.cum_distribution_function_inv(1-alpha/2) * 
sqrt(expand(vector(fns)*cov*vector(fns)))
    pntsfig = list_plot(zip(x,y),color=colorpoints)
    fajfig = plot(fitfun(t),(t,min(x),max(x)),color=colorfit)
    bndsupfig = plot(bandsup(t),(t,min(x),max(x)),color=colorbands)
    bndinffig = plot(bandinf(t),(t,min(x),max(x)),color=colorbands)
    results = {'best fit' : fitfun, 'parameters' : blist, 'R2' : R2, 's2' : 
s2, 'Delta2' : Delta2, \
    'fit response' : response, 'residuals' : residuals, 'cov' : cov, 
'standard errors' : stderr, \
    'parameter errors' : berr, 't-Statistic' : tstatistic, 'Pvalue' : 
pvalue, 'upper band': bandsup, \
    'lower band' : bandinf, 'bestfit plot' : fajfig, 'points plot' : 
pntsfig, 'upper band plot' : bndsupfig, \
    'lower band plot' : bndinffig, 'confidence level' : confidencelevel} 
    return results

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To post to this group, send email to sage-devel@googlegroups.com.
Visit this group at http://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.

[sage-devel] LinearModelFit contribution

Reply via email to