On Tue, Jul 20, 2010 at 10:12 PM, Keith Goodman <kwgood...@gmail.com> wrote: > On Tue, Jul 20, 2010 at 6:35 PM, Keith Goodman <kwgood...@gmail.com> wrote: >> On Mon, Jul 19, 2010 at 10:08 PM, Charles R Harris >> <charlesr.har...@gmail.com> wrote: >>> >>> >>> On Mon, Jul 19, 2010 at 9:40 PM, Keith Goodman <kwgood...@gmail.com> wrote: >>>> >>>> On Mon, Jul 19, 2010 at 8:27 PM, Charles R Harris >>>> <charlesr.har...@gmail.com> wrote: >>>> > >>>> > >>>> > On Mon, Jul 19, 2010 at 9:02 PM, Keith Goodman <kwgood...@gmail.com> >>>> > wrote: >>>> >> >>>> >> On Mon, Jul 19, 2010 at 6:53 PM, Joshua Holbrook >>>> >> <josh.holbr...@gmail.com> wrote: >>>> >> > On Mon, Jul 19, 2010 at 5:50 PM, Charles R Harris >>>> >> > <charlesr.har...@gmail.com> wrote: >>>> >> >> Hi All, >>>> >> >> >>>> >> >> I'm thinking about adding some functionality to lstsq because I find >>>> >> >> myself >>>> >> >> doing the same fixes over and over. List follows. >>>> >> >> >>>> >> >> Add weights so data points can be weighted. >>>> >> >> Use column scaling so condition numbers make more sense. >>>> >> >> Compute covariance approximation? >>>> >> >> >>>> >> >> Unfortunately, the last will require using svd since there no linear >>>> >> >> least >>>> >> >> squares routines in LAPACK that also compute the covariance, at >>>> >> >> least >>>> >> >> that >>>> >> >> google knows about. >>>> >> >> >>>> >> >> Thoughts? >>>> >> > >>>> >> > Maybe make 2 functions--one which implements 1 and 2, and another >>>> >> > which implements 3? I think weights is an excellent idea! >>>> >> >>>> >> I'd like a lstsq that did less, like not calculate the sum of squared >>>> >> residuals. That's useful in tight loops. So I also think having two >>>> >> lstsq makes sense. One as basic as possible; one with bells. How does >>>> >> scipy's lstsq fit into all this? >>>> > >>>> > I think the computation of the residues is cheap in lstsq. The >>>> > algolrithm >>>> > used starts by reducing the design matrix to bidiagonal from and reduces >>>> > the >>>> > rhs at the same time. In other words an mxn problem becomes a (n+1)xn >>>> > problem. That's why the summed square of residuals is available but not >>>> > the >>>> > individual residuals, after the reduction there is only one residual and >>>> > its >>>> > square it the residue. >>>> >>>> That does sound good. But it must take some time. There's indexing, >>>> array creation, if statement, summing: >>>> >>>> if results['rank'] == n and m > n: >>>> resids = sum((transpose(bstar)[n:,:])**2, >>>> axis=0).astype(result_t) >>>> >>>> Here are the timings after removing the sum of squared residuals: >>>> >>>> >> x = np.random.rand(1000,10) >>>> >> y = np.random.rand(1000) >>>> >> timeit np.linalg.lstsq(x, y) >>>> 1000 loops, best of 3: 369 us per loop >>>> >> timeit np.linalg.lstsq2(x, y) >>>> 1000 loops, best of 3: 344 us per loop >>>> >> >>>> >> x = np.random.rand(10,2) >>>> >> y = np.random.rand(10) >>>> >> timeit np.linalg.lstsq(x, y) >>>> 10000 loops, best of 3: 102 us per loop >>>> >> timeit np.linalg.lstsq2(x, y) >>>> 10000 loops, best of 3: 77 us per loop >>>> _ >>> >>> Now that I've looked through the driver program I see that you are right. >>> Also, all the info needed for the covariance is almost available in the >>> LAPACK driver program. Hmm, it seems that maybe the best thing to do here is >>> to pull out the lapack_lite driver program, modify it, and make it a >>> standalone python module that links to either the lapack_lite programs or >>> the ATLAS versions. But that is more work than just doing things in python >>> :-\ It does have the added advantage that all the work arrays can be handled >>> internally. >> >> While taking a quick look at lstsq to see what I could cut, this line >> caught my eye: >> >> bstar[:b.shape[0],:n_rhs] = b.copy() > > Even if bstar contained a view of b, the next line in lstsq would take > care of it: > > a, bstar = _fastCopyAndTranspose(t, a, bstar) >
If a view is never taken, then what does the copy method actually do? A redundant copy? The b matrix to *gelsd is in/out, so if the original b is not going to be written to then taking out the redundant copy seems like a good idea. Out of curiosity, is there an explicit way to check if these share memory? import numpy as np b = np.empty((1000,2)) c = np.arange(1000)[:,None] b[:,1:] = c Then what? Skipper >> I thought that array.__setitem__ never gives a view of the right hand >> side. If that's so then the copy is not needed. That would save some >> time since b can be large. All unit test pass when I remove the copy, >> but you know how that goes... >> >> I also noticed that "import math" is done inside the lstsq function. >> Why is that? >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion