Re: [Numpy-discussion] Multiple Regression

2009-11-12 Thread Gökhan Sever
On Thu, Nov 12, 2009 at 9:14 PM, Sturla Molden  wrote:
> Alexey Tigarev skrev:
>> I have implemented multiple regression in a following way:
>>
>>
> You should be using QR or SVD for this.
>
> Sturla
>

Seeing this QR and SVD terms I recalled the answer to the "I am the very
model for a student mathematical" poem. I am just quoting the answer part
for the full poem see Google :) My apologies for the noise in the subject...

"When you have learnt just what is meant by 'Jacobian' and 'Abelian';
When you at sight can estimate, for the modal, mean and median;
When describing normal subgroups is much more than recitation;
When you understand precisely what is 'quantum excitation';

When you know enough statistics that you can recognise RV;
When you have learnt all advances that have been made in SVD;
And when you can spot the transform that solves some tricky PDE,
You will feel no better student has ever sat for a degree.

Your accumulated knowledge, whilst extensive and exemplary,
Will have only been brought down to the beginning of last century,
But still in matters rational, and logical and practical,
You'll be the very model of a student mathematical."

*
*K. F. Riley, with apologies to W.S. Gilbert *




-- 
Gökhan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Multiple Regression

2009-11-12 Thread Sturla Molden
Alexey Tigarev skrev:
> I have implemented multiple regression in a following way:
>
>   
You should be using QR or SVD for this.

Sturla


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Multiple Regression

2009-11-12 Thread josef . pktd
On Thu, Nov 12, 2009 at 6:44 PM, Robert Kern  wrote:
> On Thu, Nov 12, 2009 at 17:38, Alexey Tigarev  
> wrote:
>> Hi All!
>>
>> I have implemented multiple regression in a following way:
>>
>> def multipleRegression(x, y):
>>    """ Perform linear regression using least squares method.
>>
>>    X - matrix containing inputs for observations,
>>    y - vector containing one of outputs for every observation """
>>    mulregLogger.debug("multipleRegression(x=%s, y=%s)" % (x, y))
>>    xt = transpose(x)
>>    a = dot(xt, x)     # A = xt * x
>>    b = dot(xt, y)     # B = xt * y
>>    try:
>>        return linalg.solve(a, b)
>
> Never, ever use the normal equations. :-)
>
> Use linalg.lstsq(x, y) instead.
>
>>    except linalg.LinAlgError, lae:
>>        mulregLogger.warn("Singular matrix:\n%s" % (a))
>>        mulregLogger.warn(lae)
>>        mulregLogger.warn("Determinant: %f" % (linalg.det(a)))
>>        raise lae
>>
>> Can you suggest me something to optimize it?
>>
>> I am using it on large number of observations so it is common to have
>> "x" matrix of about 5000x20 and "y" vector of length 5000, and more.
>> I also have to run that multiple times for different "y" vectors and
>> same "x" matrix.
>
> Just make a matrix "y" such that each column vector is a different
> output vector (e.g. y.shape == (5000, number_of_different_y_vectors))

or if you want to do it sequentially, this should work

xpinv = linalg.pinv(x)

for y in all_ys:
   beta = np.dot(xpinv, y)

but this works for singular problems without warning

Josef

>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Multiple Regression

2009-11-12 Thread David Warde-Farley
On 12-Nov-09, at 6:44 PM, Robert Kern wrote:

>> I am using it on large number of observations so it is common to have
>> "x" matrix of about 5000x20 and "y" vector of length 5000, and more.
>> I also have to run that multiple times for different "y" vectors and
>> same "x" matrix.
>
> Just make a matrix "y" such that each column vector is a different
> output vector (e.g. y.shape == (5000, number_of_different_y_vectors))

Hmm, I just noticed that numpy.linalg.solve's docstring is wrong.

 Parameters
 --
 a : array_like, shape (M, M)
 Coefficient matrix.
 b : array_like, shape (M,)
 Ordinate or "dependent variable" values.

Whereas it *does* take multiple right-hand-side vectors, just like  
it's scipy.linalg counterpart (I found it somewhat odd that it  
wouldn't considering the LAPACK functions do).

So I guess that should read

 Parameters
 --
 a : array_like, shape (M, M)
 Coefficient matrix.
 b : array_like, shape (M,) or (M, N)
 Ordinate or "dependent variable" values.

I'll update it in the doc editor when I get home, if no one beats me  
to it.

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Multiple Regression

2009-11-12 Thread Robert Kern
On Thu, Nov 12, 2009 at 17:38, Alexey Tigarev  wrote:
> Hi All!
>
> I have implemented multiple regression in a following way:
>
> def multipleRegression(x, y):
>    """ Perform linear regression using least squares method.
>
>    X - matrix containing inputs for observations,
>    y - vector containing one of outputs for every observation """
>    mulregLogger.debug("multipleRegression(x=%s, y=%s)" % (x, y))
>    xt = transpose(x)
>    a = dot(xt, x)     # A = xt * x
>    b = dot(xt, y)     # B = xt * y
>    try:
>        return linalg.solve(a, b)

Never, ever use the normal equations. :-)

Use linalg.lstsq(x, y) instead.

>    except linalg.LinAlgError, lae:
>        mulregLogger.warn("Singular matrix:\n%s" % (a))
>        mulregLogger.warn(lae)
>        mulregLogger.warn("Determinant: %f" % (linalg.det(a)))
>        raise lae
>
> Can you suggest me something to optimize it?
>
> I am using it on large number of observations so it is common to have
> "x" matrix of about 5000x20 and "y" vector of length 5000, and more.
> I also have to run that multiple times for different "y" vectors and
> same "x" matrix.

Just make a matrix "y" such that each column vector is a different
output vector (e.g. y.shape == (5000, number_of_different_y_vectors))

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Multiple Regression

2009-11-12 Thread Alexey Tigarev
Hi All!

I have implemented multiple regression in a following way:

def multipleRegression(x, y):
""" Perform linear regression using least squares method.

X - matrix containing inputs for observations,
y - vector containing one of outputs for every observation """
mulregLogger.debug("multipleRegression(x=%s, y=%s)" % (x, y))
xt = transpose(x)
a = dot(xt, x) # A = xt * x
b = dot(xt, y) # B = xt * y
try:
return linalg.solve(a, b)
except linalg.LinAlgError, lae:
mulregLogger.warn("Singular matrix:\n%s" % (a))
mulregLogger.warn(lae)
mulregLogger.warn("Determinant: %f" % (linalg.det(a)))
raise lae

Can you suggest me something to optimize it?

I am using it on large number of observations so it is common to have
"x" matrix of about 5000x20 and "y" vector of length 5000, and more.
I also have to run that multiple times for different "y" vectors and
same "x" matrix.

Thanks,
Alexey
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion