On Thu, Nov 12, 2009 at 17:38, Alexey Tigarev <alexey.tiga...@gmail.com> wrote: > Hi All! > > I have implemented multiple regression in a following way: > > def multipleRegression(x, y): > """ Perform linear regression using least squares method. > > X - matrix containing inputs for observations, > y - vector containing one of outputs for every observation """ > mulregLogger.debug("multipleRegression(x=%s, y=%s)" % (x, y)) > xt = transpose(x) > a = dot(xt, x) # A = xt * x > b = dot(xt, y) # B = xt * y > try: > return linalg.solve(a, b)
Never, ever use the normal equations. :-) Use linalg.lstsq(x, y) instead. > except linalg.LinAlgError, lae: > mulregLogger.warn("Singular matrix:\n%s" % (a)) > mulregLogger.warn(lae) > mulregLogger.warn("Determinant: %f" % (linalg.det(a))) > raise lae > > Can you suggest me something to optimize it? > > I am using it on large number of observations so it is common to have > "x" matrix of about 5000x20 and "y" vector of length 5000, and more. > I also have to run that multiple times for different "y" vectors and > same "x" matrix. Just make a matrix "y" such that each column vector is a different output vector (e.g. y.shape == (5000, number_of_different_y_vectors)) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion