Re: [Numpy-discussion] Multiple Regression
On Thu, Nov 12, 2009 at 9:14 PM, Sturla Molden wrote: > Alexey Tigarev skrev: >> I have implemented multiple regression in a following way: >> >> > You should be using QR or SVD for this. > > Sturla > Seeing this QR and SVD terms I recalled the answer to the "I am the very model for a student mathematical" poem. I am just quoting the answer part for the full poem see Google :) My apologies for the noise in the subject... "When you have learnt just what is meant by 'Jacobian' and 'Abelian'; When you at sight can estimate, for the modal, mean and median; When describing normal subgroups is much more than recitation; When you understand precisely what is 'quantum excitation'; When you know enough statistics that you can recognise RV; When you have learnt all advances that have been made in SVD; And when you can spot the transform that solves some tricky PDE, You will feel no better student has ever sat for a degree. Your accumulated knowledge, whilst extensive and exemplary, Will have only been brought down to the beginning of last century, But still in matters rational, and logical and practical, You'll be the very model of a student mathematical." * *K. F. Riley, with apologies to W.S. Gilbert * -- Gökhan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Multiple Regression
Alexey Tigarev skrev: > I have implemented multiple regression in a following way: > > You should be using QR or SVD for this. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Multiple Regression
On Thu, Nov 12, 2009 at 6:44 PM, Robert Kern wrote: > On Thu, Nov 12, 2009 at 17:38, Alexey Tigarev > wrote: >> Hi All! >> >> I have implemented multiple regression in a following way: >> >> def multipleRegression(x, y): >> """ Perform linear regression using least squares method. >> >> X - matrix containing inputs for observations, >> y - vector containing one of outputs for every observation """ >> mulregLogger.debug("multipleRegression(x=%s, y=%s)" % (x, y)) >> xt = transpose(x) >> a = dot(xt, x) # A = xt * x >> b = dot(xt, y) # B = xt * y >> try: >> return linalg.solve(a, b) > > Never, ever use the normal equations. :-) > > Use linalg.lstsq(x, y) instead. > >> except linalg.LinAlgError, lae: >> mulregLogger.warn("Singular matrix:\n%s" % (a)) >> mulregLogger.warn(lae) >> mulregLogger.warn("Determinant: %f" % (linalg.det(a))) >> raise lae >> >> Can you suggest me something to optimize it? >> >> I am using it on large number of observations so it is common to have >> "x" matrix of about 5000x20 and "y" vector of length 5000, and more. >> I also have to run that multiple times for different "y" vectors and >> same "x" matrix. > > Just make a matrix "y" such that each column vector is a different > output vector (e.g. y.shape == (5000, number_of_different_y_vectors)) or if you want to do it sequentially, this should work xpinv = linalg.pinv(x) for y in all_ys: beta = np.dot(xpinv, y) but this works for singular problems without warning Josef > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Multiple Regression
On 12-Nov-09, at 6:44 PM, Robert Kern wrote: >> I am using it on large number of observations so it is common to have >> "x" matrix of about 5000x20 and "y" vector of length 5000, and more. >> I also have to run that multiple times for different "y" vectors and >> same "x" matrix. > > Just make a matrix "y" such that each column vector is a different > output vector (e.g. y.shape == (5000, number_of_different_y_vectors)) Hmm, I just noticed that numpy.linalg.solve's docstring is wrong. Parameters -- a : array_like, shape (M, M) Coefficient matrix. b : array_like, shape (M,) Ordinate or "dependent variable" values. Whereas it *does* take multiple right-hand-side vectors, just like it's scipy.linalg counterpart (I found it somewhat odd that it wouldn't considering the LAPACK functions do). So I guess that should read Parameters -- a : array_like, shape (M, M) Coefficient matrix. b : array_like, shape (M,) or (M, N) Ordinate or "dependent variable" values. I'll update it in the doc editor when I get home, if no one beats me to it. David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Multiple Regression
On Thu, Nov 12, 2009 at 17:38, Alexey Tigarev wrote: > Hi All! > > I have implemented multiple regression in a following way: > > def multipleRegression(x, y): > """ Perform linear regression using least squares method. > > X - matrix containing inputs for observations, > y - vector containing one of outputs for every observation """ > mulregLogger.debug("multipleRegression(x=%s, y=%s)" % (x, y)) > xt = transpose(x) > a = dot(xt, x) # A = xt * x > b = dot(xt, y) # B = xt * y > try: > return linalg.solve(a, b) Never, ever use the normal equations. :-) Use linalg.lstsq(x, y) instead. > except linalg.LinAlgError, lae: > mulregLogger.warn("Singular matrix:\n%s" % (a)) > mulregLogger.warn(lae) > mulregLogger.warn("Determinant: %f" % (linalg.det(a))) > raise lae > > Can you suggest me something to optimize it? > > I am using it on large number of observations so it is common to have > "x" matrix of about 5000x20 and "y" vector of length 5000, and more. > I also have to run that multiple times for different "y" vectors and > same "x" matrix. Just make a matrix "y" such that each column vector is a different output vector (e.g. y.shape == (5000, number_of_different_y_vectors)) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Multiple Regression
Hi All! I have implemented multiple regression in a following way: def multipleRegression(x, y): """ Perform linear regression using least squares method. X - matrix containing inputs for observations, y - vector containing one of outputs for every observation """ mulregLogger.debug("multipleRegression(x=%s, y=%s)" % (x, y)) xt = transpose(x) a = dot(xt, x) # A = xt * x b = dot(xt, y) # B = xt * y try: return linalg.solve(a, b) except linalg.LinAlgError, lae: mulregLogger.warn("Singular matrix:\n%s" % (a)) mulregLogger.warn(lae) mulregLogger.warn("Determinant: %f" % (linalg.det(a))) raise lae Can you suggest me something to optimize it? I am using it on large number of observations so it is common to have "x" matrix of about 5000x20 and "y" vector of length 5000, and more. I also have to run that multiple times for different "y" vectors and same "x" matrix. Thanks, Alexey ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion