Re: [Numpy-discussion] matrix default to column vector?
On Fri, Jun 5, 2009 at 11:30 AM, Alan G Isaac wrote: > On 6/5/2009 11:38 AM Olivier Verdier apparently wrote: > > I think matrices can be pretty tricky when used for > > teaching. For instance, you have to explain that all the > > operators work component-wise, except the multiplication! > > Another caveat is that since matrices are always 2x2, the > > "scalar product" of two column vectors computed as " x.T > > * y" will not be a scalar, but a 2x2 matrix. There is also > > the fact that you must cast all your vectors to column/raw > > matrices (as in matlab). For all these reasons, I prefer > > to use arrays and dot for teaching, and I have never had > > any complaints. > > > I do not understand this "argument". > You should take it very seriously when someone > reports to you that the matrix object is a crucial to them, > e.g., as a teaching tool. Even if you do not find > personally persuasive an example like > http://mail.scipy.org/pipermail/numpy-discussion/2009-June/043001.html > I have told you: this is important for my students. > Reporting that your students do not complain about using > arrays instead of matrices does not change this one bit. > > Student backgrounds differ by domain of application. In > economics, matrices are in *very* wide use, and > multidimensional arrays get almost no use. Textbooks in > econometrics (a huge and important field, even outside of > economics) are full of proofs using matrix algebra. > A close match to what the students see is crucial. > When working with multiplication or exponentiation, > matrices do what they expect, and 2d arrays do not. > > One more point. As Python users we get used to installing > a package here and a package there to add functionality. > But this is not how most people looking for a matrix > language see the world. Removing the matrix object from > NumPy will raise the barrier to adoption by social > scientists, and there should be a strongly persuasive reason > before taking such a step. > > Separately from all that, does anyone doubt that there is > code that depends on the matrix object? The core objection > to a past proposal for useful change was that it could break > extant code. I would hope that nobody who took that > position would subsequently propose removing the matrix > object altogether. > > Cheers, > Alan Isaac > > PS If x and y are "column vectors" (i.e., matrices), then > x.T * y *should* be a 1×1 matrix. > Since the * operator is doing matrix multiplication, > this is the correct result, not an anomaly. > Well, one could argue that. The x.T is a member of the dual, hence maps vectors to the reals. Usually the reals aren't represented by 1x1 matrices. Just my [.02] cents. Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 5:59 PM, Brent Pedersen wrote: > On Fri, Jun 5, 2009 at 2:01 PM, Keith Goodman wrote: >> On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen wrote: >>> On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote: On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman wrote: > On Fri, Jun 5, 2009 at 12:53 PM, wrote: >> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais wrote: >>> Hello, >>> I have a vectorizing problem that I don't see an obvious way to solve. >>> What >>> I have is a vector like: >>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>> and a matrix >>> T=zeros((6,6)) >>> and what I want in T is a count of all of the transitions in obs, e.g. >>> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the >>> sequence 3-4 only happens once, etc... I can do it unvectorized like: >>> for o1,o2 in zip(obs[:-1],obs[1:]): >>> T[o1,o2]+=1 >>> >>> which gives the correct answer from above, which is: >>> array([[ 0., 0., 0., 0., 0., 0.], >>> [ 0., 0., 3., 0., 0., 1.], >>> [ 0., 3., 0., 1., 0., 0.], >>> [ 0., 0., 2., 0., 1., 0.], >>> [ 0., 0., 0., 2., 0., 0.], >>> [ 0., 0., 0., 0., 1., 0.]]) >>> >>> >>> but I thought there would be a better way. I tried: >>> o1=obs[:-1] >>> o2=obs[1:] >>> T[o1,o2]+=1 >>> but this doesn't give a count, it just yields 1's at the transition >>> points, >>> like: >>> array([[ 0., 0., 0., 0., 0., 0.], >>> [ 0., 0., 1., 0., 0., 1.], >>> [ 0., 1., 0., 1., 0., 0.], >>> [ 0., 0., 1., 0., 1., 0.], >>> [ 0., 0., 0., 1., 0., 0.], >>> [ 0., 0., 0., 0., 1., 0.]]) >>> >>> Is there a clever way to do this? I could write a quick Cython >>> solution, >>> but I wanted to keep this as an all-numpy implementation if I can. >>> >> >> histogram2d or its imitation, there was a discussion on histogram2d a >> short time ago >> > obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) > obs2 = obs - 1 > trans = > np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) > re = np.array([[ 0., 0., 0., 0., 0., 0.], >> ... [ 0., 0., 3., 0., 0., 1.], >> ... [ 0., 3., 0., 1., 0., 0.], >> ... [ 0., 0., 2., 0., 1., 0.], >> ... [ 0., 0., 0., 2., 0., 0.], >> ... [ 0., 0., 0., 0., 1., 0.]]) > np.all(re == trans) >> True >> > trans >> array([[0, 0, 0, 0, 0, 0], >> [0, 0, 3, 0, 0, 1], >> [0, 3, 0, 1, 0, 0], >> [0, 0, 2, 0, 1, 0], >> [0, 0, 0, 2, 0, 0], >> [0, 0, 0, 0, 1, 0]]) >> >> >> or >> > h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, > range=[[0,5],[0,5]]) > re >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 3., 0., 0., 1.], >> [ 0., 3., 0., 1., 0., 0.], >> [ 0., 0., 2., 0., 1., 0.], >> [ 0., 0., 0., 2., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) > h >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 3., 0., 0., 1.], >> [ 0., 3., 0., 1., 0., 0.], >> [ 0., 0., 2., 0., 1., 0.], >> [ 0., 0., 0., 2., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >> > np.all(re == h) >> True > > There's no way my list method can beat that. But by adding > > import psyco > psyco.full() > > I get a total speed up of a factor of 15 when obs is length 1. Actually, it is faster: histogram: >> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >> range=[[0,5],[0,5]]) >> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >> range=[[0,5],[0,5]]) 100 loops, best of 3: 4.14 ms per loop lists: >> timeit test(obs3, T3) 1000 loops, best of 3: 1.32 ms per loop ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> here's a go: >>> >>> import numpy as np >>> import random >>> from itertools import groupby >>> >>> def test1(obs, T): >>> for o1,o2 in zip(obs[:-1],obs[1:]): >>> T[o1][o2] += 1 >>> return T >>> >>> >>> def test2(obs, T): >>> s = zip(obs[:-1], obs[1:]) >>> for idx, g in groupby(sorted(s)): >>> T[idx] = len(list(g)) >>> return T >>> >>> obs = [random.randint(0, 5) for z in range(1)] >>> >>> print test2(obs, np.zeros((6, 6))) >>> print test1(obs, np.zeros((6, 6))) >>> >>> >>> ## >>> >>> In [10]: timeit test1(obs, np.zeros((6, 6))) >>> 100 loops, best of 3: 18.8 ms per lo
Re: [Numpy-discussion] matrix multiplication
On Fri, Jun 5, 2009 at 5:19 PM, Christopher Barker wrote: > Robert Kern wrote: > x = np.array([1,2,3]) > timeit x.sum() >>> 10 loops, best of 3: 3.01 µs per loop > from numpy import sum > timeit sum(x) >>> 10 loops, best of 3: 4.84 µs per loop > > that is a VERY short array, so one extra function call overhead could > make the difference. Is it really your use case to have such tiny sums > inside a big loop, and is there no way to vectorize that? I was trying to make the timeit difference large. It is the overhead that I was interested in. But it is still noticable when x is a "typical" size: >> x = np.arange(1000) >> timeit x.sum() 10 loops, best of 3: 5.46 µs per loop >> from numpy import sum >> timeit sum(x) 10 loops, best of 3: 7.31 µs per loop >> x = np.random.rand(1000) >> timeit x.sum() 10 loops, best of 3: 6.81 µs per loop >> timeit sum(x) 10 loops, best of 3: 8.36 µs per loop That reminds me of a big difference between arrays and matrices. Matrices have the overhead of going through python code (the matrix class) to get to the core C array code. I coverted an interative optimization function from matrices to arrays and got a speed up of a factor of 3.5. Array size was around (500, 10) and (500,) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix multiplication
Robert Kern wrote: x = np.array([1,2,3]) timeit x.sum() >> 10 loops, best of 3: 3.01 µs per loop from numpy import sum timeit sum(x) >> 10 loops, best of 3: 4.84 µs per loop that is a VERY short array, so one extra function call overhead could make the difference. Is it really your use case to have such tiny sums inside a big loop, and is there no way to vectorize that? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix multiplication
On Fri, Jun 5, 2009 at 17:54, Keith Goodman wrote: > On Fri, Jun 5, 2009 at 3:02 PM, Alan G Isaac wrote: >> I think something close to this would be possible: >> add dot as an array method. >> A .dot(B) .dot(C) >> is not as pretty as >> A * B * C >> but it is much better than >> np.dot(np.dot(A,B),C) > > I've noticed that x.sum() is faster than sum(x) > >>> x = np.array([1,2,3]) >>> timeit x.sum() > 10 loops, best of 3: 3.01 µs per loop >>> from numpy import sum >>> timeit sum(x) > 10 loops, best of 3: 4.84 µs per loop > > Would the same be true of dot? That is, would x.dot(y) be faster than > dot(x,y)? Or is it just that np.sum() has to go through some extra > python code before it hits the C code? No and yes, respectively. > In general, if I'm trying to speed up an inner loop, I try to replace > func(x) with x.func(). But I don't really understand the general > principle at work here. Most of the functions that mirror methods, including numpy.sum(), are Python functions that have a little of bit of code to convert the input to an array if necessary and to dispatch to the method. numpy.dot() is already implemented in C. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix multiplication
On Fri, Jun 5, 2009 at 3:02 PM, Alan G Isaac wrote: > I think something close to this would be possible: > add dot as an array method. > A .dot(B) .dot(C) > is not as pretty as > A * B * C > but it is much better than > np.dot(np.dot(A,B),C) I've noticed that x.sum() is faster than sum(x) >> x = np.array([1,2,3]) >> timeit x.sum() 10 loops, best of 3: 3.01 µs per loop >> from numpy import sum >> timeit sum(x) 10 loops, best of 3: 4.84 µs per loop Would the same be true of dot? That is, would x.dot(y) be faster than dot(x,y)? Or is it just that np.sum() has to go through some extra python code before it hits the C code? In general, if I'm trying to speed up an inner loop, I try to replace func(x) with x.func(). But I don't really understand the general principle at work here. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix multiplication
On Fri, Jun 5, 2009 at 4:02 PM, Alan G Isaac wrote: > On 6/5/2009 5:41 PM Chris Colbert apparently wrote: > > well, it sounded like a good idea. > > > I think something close to this would be possible: > add dot as an array method. >A .dot(B) .dot(C) > is not as pretty as >A * B * C > but it is much better than >np.dot(np.dot(A,B),C) > I prefer using the evaluation operator, so that becomes A(B)(C) or, changing the multiplication order a bit, A(B(C)) Chuck ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] IndexExpression bug?
On Fri, Jun 5, 2009 at 16:14, Michael McNeil Forbes wrote: > >>> np.array([0,1,2,3])[1:-1] > array([1, 2]) > > but > > >>> np.array([0,1,2,3])[np.s_[1:-1]] > array([1, 2, 3]) > >>> np.array([0,1,2,3])[np.index_exp[1:-1]] > array([1, 2, 3]) > > Possible fix: > class IndexExpression(object): > ... > def __len__(self): > return 0 > > (Presently this returns sys.maxint) > > Does this break anything (I can't find any coverage tests)? If not, I > will submit a ticket. I think that getting rid of __getslice__ and __len__ should work better. I don't really understand what the logic was behind including them in the first place, though. I might be missing something. In [21]: %cpaste Pasting code; enter '--' alone on the line to stop. :class IndexExpression(object): :""" :A nicer way to build up index tuples for arrays. : :For any index combination, including slicing and axis insertion, :'a[indices]' is the same as 'a[index_exp[indices]]' for any :array 'a'. However, 'index_exp[indices]' can be used anywhere :in Python code and returns a tuple of slice objects that can be :used in the construction of complex index expressions. :""" : :def __init__(self, maketuple): :self.maketuple = maketuple : :def __getitem__(self, item): :if self.maketuple and type(item) is not tuple: :return (item,) :else: :return item :-- In [22]: s2 = IndexExpression(False) In [23]: s2[1:-1] Out[23]: slice(1, -1, None) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix multiplication
> I think something close to this would be possible: > add dot as an array method. > A .dot(B) .dot(C) > is not as pretty as > A * B * C > but it is much better than > np.dot(np.dot(A,B),C) That is much better. Matthew ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix multiplication
On 6/5/2009 5:41 PM Chris Colbert apparently wrote: > well, it sounded like a good idea. I think something close to this would be possible: add dot as an array method. A .dot(B) .dot(C) is not as pretty as A * B * C but it is much better than np.dot(np.dot(A,B),C) In fact it is so much better, that it might (?) be worth considering separately from the entire matrix discussion. And it does not provide a matrix exponential, of course. Alan Isaac ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 2:01 PM, Keith Goodman wrote: > On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen wrote: >> On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote: >>> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman wrote: On Fri, Jun 5, 2009 at 12:53 PM, wrote: > On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais wrote: >> Hello, >> I have a vectorizing problem that I don't see an obvious way to solve. >> What >> I have is a vector like: >> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >> and a matrix >> T=zeros((6,6)) >> and what I want in T is a count of all of the transitions in obs, e.g. >> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the >> sequence 3-4 only happens once, etc... I can do it unvectorized like: >> for o1,o2 in zip(obs[:-1],obs[1:]): >> T[o1,o2]+=1 >> >> which gives the correct answer from above, which is: >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 3., 0., 0., 1.], >> [ 0., 3., 0., 1., 0., 0.], >> [ 0., 0., 2., 0., 1., 0.], >> [ 0., 0., 0., 2., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >> >> >> but I thought there would be a better way. I tried: >> o1=obs[:-1] >> o2=obs[1:] >> T[o1,o2]+=1 >> but this doesn't give a count, it just yields 1's at the transition >> points, >> like: >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 1., 0., 0., 1.], >> [ 0., 1., 0., 1., 0., 0.], >> [ 0., 0., 1., 0., 1., 0.], >> [ 0., 0., 0., 1., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >> >> Is there a clever way to do this? I could write a quick Cython solution, >> but I wanted to keep this as an all-numpy implementation if I can. >> > > histogram2d or its imitation, there was a discussion on histogram2d a > short time ago > obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) obs2 = obs - 1 trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) re = np.array([[ 0., 0., 0., 0., 0., 0.], > ... [ 0., 0., 3., 0., 0., 1.], > ... [ 0., 3., 0., 1., 0., 0.], > ... [ 0., 0., 2., 0., 1., 0.], > ... [ 0., 0., 0., 2., 0., 0.], > ... [ 0., 0., 0., 0., 1., 0.]]) np.all(re == trans) > True > trans > array([[0, 0, 0, 0, 0, 0], > [0, 0, 3, 0, 0, 1], > [0, 3, 0, 1, 0, 0], > [0, 0, 2, 0, 1, 0], > [0, 0, 0, 2, 0, 0], > [0, 0, 0, 0, 1, 0]]) > > > or > h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) re > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) h > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) > np.all(re == h) > True There's no way my list method can beat that. But by adding import psyco psyco.full() I get a total speed up of a factor of 15 when obs is length 1. >>> >>> Actually, it is faster: >>> >>> histogram: >>> > h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) > timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, > range=[[0,5],[0,5]]) >>> 100 loops, best of 3: 4.14 ms per loop >>> >>> lists: >>> > timeit test(obs3, T3) >>> 1000 loops, best of 3: 1.32 ms per loop >>> ___ >>> Numpy-discussion mailing list >>> Numpy-discussion@scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> here's a go: >> >> import numpy as np >> import random >> from itertools import groupby >> >> def test1(obs, T): >> for o1,o2 in zip(obs[:-1],obs[1:]): >> T[o1][o2] += 1 >> return T >> >> >> def test2(obs, T): >> s = zip(obs[:-1], obs[1:]) >> for idx, g in groupby(sorted(s)): >> T[idx] = len(list(g)) >> return T >> >> obs = [random.randint(0, 5) for z in range(1)] >> >> print test2(obs, np.zeros((6, 6))) >> print test1(obs, np.zeros((6, 6))) >> >> >> ## >> >> In [10]: timeit test1(obs, np.zeros((6, 6))) >> 100 loops, best of 3: 18.8 ms per loop >> >> In [11]: timeit test2(obs, np.zeros((6, 6))) >> 100 loops, best of 3: 6.91 ms per loop > > Wait, you tested the list method with an array. Try > > timeit test1(obs, np.zeros((6, 6)).tolist()) > > Probably best to
Re: [Numpy-discussion] Maturing the Matrix class in NumPy
well, it sounded like a good idea. Oh, well. On Fri, Jun 5, 2009 at 5:28 PM, Robert Kern wrote: > On Fri, Jun 5, 2009 at 16:24, Chris Colbert wrote: > > How about just introducing a slightly different syntax which tells numpy > to > > handle the array like a matrix: > > > > Some thing along the lines of this: > > > > A = array[[..]] > > B = array[[..]] > > > > elementwise multipication (as it currently is): > > > > C = A * B > > > > matrix multiplication: > > > > C = {A} * {B} > > > > or > > > > C = [A] * [B] > > > > or any other brace we decide on > > > > Essentially, the brace tells numpy to handle the array objects like > > matrices, but with no need for a specific matrix type. > > We don't control the Python language. We cannot make these kinds of > changes to the syntax. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] performance matrix multiplication vs. matlab
I'll caution anyone from using Atlas from the repos in Ubuntu 9.04 as the package is broken: https://bugs.launchpad.net/ubuntu/+source/atlas/+bug/363510 just build Atlas yourself, you get better performance AND threading. Building it is not the nightmare it sounds like. I think i've done it a total of four times now, both 32-bit and 64-bit builds. If you need help with it, just email me off list. Cheers, Chris On Fri, Jun 5, 2009 at 2:46 PM, Matthieu Brucher wrote: > 2009/6/5 David Cournapeau : > > Eric Firing wrote: > >> > >> David, > >> > >> The eigen web site indicates that eigen achieves high performance > >> without all the compilation difficulty of atlas. Does eigen have enough > >> functionality to replace atlas in numpy? > > > > No, eigen does not provide a (complete) BLAS/LAPACK interface. I don't > > know if that's even a goal of eigen (it started as a project for KDE, to > > support high performance core computations for things like spreadsheet > > and co). > > > > But even then, it would be a huge undertaking. For all its flaws, LAPACK > > is old, tested code, with a very stable language (F77). Eigen is: > >- not mature. > >- heavily expression-template-based C++, meaning compilation takes > > ages + esoteric, impossible to decypher compilation errors. We have > > enough build problems already :) > >- SSE dependency harcoded, since it is setup at build time. That's > > going backward IMHO - I would rather see a numpy/scipy which can load > > the optimized code at runtime. > > I would add that it relies on C++ compiler extensions (the restrict > keyword) as does blitz. You unfortunately can't expect every compiler > to support it unless the C++ committee finally adds it to the > standard. > > Matthieu > -- > Information System Engineer, Ph.D. > Website: http://matthieu-brucher.developpez.com/ > Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 > LinkedIn: http://www.linkedin.com/in/matthieubrucher > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] IndexExpression bug?
>>> np.array([0,1,2,3])[1:-1] array([1, 2]) but >>> np.array([0,1,2,3])[np.s_[1:-1]] array([1, 2, 3]) >>> np.array([0,1,2,3])[np.index_exp[1:-1]] array([1, 2, 3]) Possible fix: class IndexExpression(object): ... def __len__(self): return 0 (Presently this returns sys.maxint) Does this break anything (I can't find any coverage tests)? If not, I will submit a ticket. Michael. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Maturing the Matrix class in NumPy
On Fri, Jun 5, 2009 at 16:24, Chris Colbert wrote: > How about just introducing a slightly different syntax which tells numpy to > handle the array like a matrix: > > Some thing along the lines of this: > > A = array[[..]] > B = array[[..]] > > elementwise multipication (as it currently is): > > C = A * B > > matrix multiplication: > > C = {A} * {B} > > or > > C = [A] * [B] > > or any other brace we decide on > > Essentially, the brace tells numpy to handle the array objects like > matrices, but with no need for a specific matrix type. We don't control the Python language. We cannot make these kinds of changes to the syntax. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Maturing the Matrix class in NumPy
How about just introducing a slightly different syntax which tells numpy to handle the array like a matrix: Some thing along the lines of this: A = array[[..]] B = array[[..]] elementwise multipication (as it currently is): C = A * B matrix multiplication: C = {A} * {B} or C = [A] * [B] or any other brace we decide on Essentially, the brace tells numpy to handle the array objects like matrices, but with no need for a specific matrix type. Since textbook math typical has some kind of bracket around matrix variable, I think this would jive well just my .02 Chris On Fri, Jun 5, 2009 at 5:14 PM, Alan G Isaac wrote: > On 6/5/2009 3:49 PM Stéfan van der Walt apparently wrote: > > If the Matrix class is to remain, we need to take the steps > > necessary to integrate it into NumPy properly. > > I think this requires a list of current problems. > Many of the problems for NumPy have been addressed over time. > I believe the remaining problems center more on SciPy rather than NumPy. > This requires that users report difficulties. > > For example, Jason Rennie says he ran into problems with > scipy.optimize.fmin_cg, although I do not recall him reporting > these (I do recall an optimization problem he reported using > ndarrays). Has he filed a bug report detailing his problem? > > > > To get going we'll need a list of changes required (i.e. "in an ideal > > world, how would matrices work?"). > > The key anomaly concerning matrices comes with indexing. > See the introduction here: http://www.scipy.org/MatrixIndexing > > However changing this for the current matrix object was rejected > in the last (exhausting) go round. > > > > There should be a set protocol for > > all numpy functions that guarantees compatibility with ndarrays, > > matrices and other derived classes. > > My impression was that this was resolved as follows: > handle all ndarray based objects as arrays (using asarray) > in any NumPy function, but return the subclass when possible. > (E.g., using asmatrix, return a matrix output for a matrix input.) > This seems fine to me. > > > > Being one of the most vocal proponents of the Matrix class, would you > > be prepared to develop your Matrix Proposal at > > http://scipy.org/NewMatrixSpec further? > > I consider my proposal to have the following status: rejected. > I consider the core reason to be: creates a backwards incompatibility. > That was a very long and exhausting discussion that was productive > in laying out the issues, but I do not think we can progress in that > direction. > > The existing matrix object is very usable. > It's primary problem is some indexing anomalies, > http://www.scipy.org/MatrixIndexing > and not everyone saw those as problems. > In terms of NumPy functions, I think the asarray/asmatrix > protocol fits the bill. (Altho perhaps I am overlooking > something as a user that is obvious to a developer.) > > Cheers, > Alan > > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Maturing the Matrix class in NumPy
On 6/5/2009 3:49 PM Stéfan van der Walt apparently wrote: > If the Matrix class is to remain, we need to take the steps > necessary to integrate it into NumPy properly. I think this requires a list of current problems. Many of the problems for NumPy have been addressed over time. I believe the remaining problems center more on SciPy rather than NumPy. This requires that users report difficulties. For example, Jason Rennie says he ran into problems with scipy.optimize.fmin_cg, although I do not recall him reporting these (I do recall an optimization problem he reported using ndarrays). Has he filed a bug report detailing his problem? > To get going we'll need a list of changes required (i.e. "in an ideal > world, how would matrices work?"). The key anomaly concerning matrices comes with indexing. See the introduction here: http://www.scipy.org/MatrixIndexing However changing this for the current matrix object was rejected in the last (exhausting) go round. > There should be a set protocol for > all numpy functions that guarantees compatibility with ndarrays, > matrices and other derived classes. My impression was that this was resolved as follows: handle all ndarray based objects as arrays (using asarray) in any NumPy function, but return the subclass when possible. (E.g., using asmatrix, return a matrix output for a matrix input.) This seems fine to me. > Being one of the most vocal proponents of the Matrix class, would you > be prepared to develop your Matrix Proposal at > http://scipy.org/NewMatrixSpec further? I consider my proposal to have the following status: rejected. I consider the core reason to be: creates a backwards incompatibility. That was a very long and exhausting discussion that was productive in laying out the issues, but I do not think we can progress in that direction. The existing matrix object is very usable. It's primary problem is some indexing anomalies, http://www.scipy.org/MatrixIndexing and not everyone saw those as problems. In terms of NumPy functions, I think the asarray/asmatrix protocol fits the bill. (Altho perhaps I am overlooking something as a user that is obvious to a developer.) Cheers, Alan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen wrote: > On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote: >> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman wrote: >>> On Fri, Jun 5, 2009 at 12:53 PM, wrote: On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais wrote: > Hello, > I have a vectorizing problem that I don't see an obvious way to solve. > What > I have is a vector like: > obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) > and a matrix > T=zeros((6,6)) > and what I want in T is a count of all of the transitions in obs, e.g. > T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the > sequence 3-4 only happens once, etc... I can do it unvectorized like: > for o1,o2 in zip(obs[:-1],obs[1:]): > T[o1,o2]+=1 > > which gives the correct answer from above, which is: > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) > > > but I thought there would be a better way. I tried: > o1=obs[:-1] > o2=obs[1:] > T[o1,o2]+=1 > but this doesn't give a count, it just yields 1's at the transition > points, > like: > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 1., 0., 0., 1.], > [ 0., 1., 0., 1., 0., 0.], > [ 0., 0., 1., 0., 1., 0.], > [ 0., 0., 0., 1., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) > > Is there a clever way to do this? I could write a quick Cython solution, > but I wanted to keep this as an all-numpy implementation if I can. > histogram2d or its imitation, there was a discussion on histogram2d a short time ago >>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>> obs2 = obs - 1 >>> trans = >>> np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) >>> re = np.array([[ 0., 0., 0., 0., 0., 0.], ... [ 0., 0., 3., 0., 0., 1.], ... [ 0., 3., 0., 1., 0., 0.], ... [ 0., 0., 2., 0., 1., 0.], ... [ 0., 0., 0., 2., 0., 0.], ... [ 0., 0., 0., 0., 1., 0.]]) >>> np.all(re == trans) True >>> trans array([[0, 0, 0, 0, 0, 0], [0, 0, 3, 0, 0, 1], [0, 3, 0, 1, 0, 0], [0, 0, 2, 0, 1, 0], [0, 0, 0, 2, 0, 0], [0, 0, 0, 0, 1, 0]]) or >>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>> range=[[0,5],[0,5]]) >>> re array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) >>> h array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) >>> np.all(re == h) True >>> >>> There's no way my list method can beat that. But by adding >>> >>> import psyco >>> psyco.full() >>> >>> I get a total speed up of a factor of 15 when obs is length 1. >> >> Actually, it is faster: >> >> histogram: >> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) >> 100 loops, best of 3: 4.14 ms per loop >> >> lists: >> timeit test(obs3, T3) >> 1000 loops, best of 3: 1.32 ms per loop >> ___ >> Numpy-discussion mailing list >> Numpy-discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > here's a go: > > import numpy as np > import random > from itertools import groupby > > def test1(obs, T): > for o1,o2 in zip(obs[:-1],obs[1:]): > T[o1][o2] += 1 > return T > > > def test2(obs, T): > s = zip(obs[:-1], obs[1:]) > for idx, g in groupby(sorted(s)): > T[idx] = len(list(g)) > return T > > obs = [random.randint(0, 5) for z in range(1)] > > print test2(obs, np.zeros((6, 6))) > print test1(obs, np.zeros((6, 6))) > > > ## > > In [10]: timeit test1(obs, np.zeros((6, 6))) > 100 loops, best of 3: 18.8 ms per loop > > In [11]: timeit test2(obs, np.zeros((6, 6))) > 100 loops, best of 3: 6.91 ms per loop Wait, you tested the list method with an array. Try timeit test1(obs, np.zeros((6, 6)).tolist()) Probably best to move the array/list creation out of the timeit loop. Then my method won't have to pay the cost of converting to a list :) ___ Numpy-discussion mailing list Numpy-discus
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 4:35 PM, Keith Goodman wrote: > On Fri, Jun 5, 2009 at 1:31 PM, wrote: >> On Fri, Jun 5, 2009 at 4:27 PM, Keith Goodman wrote: >>> On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen wrote: On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote: > On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman wrote: >> On Fri, Jun 5, 2009 at 12:53 PM, wrote: >>> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais wrote: Hello, I have a vectorizing problem that I don't see an obvious way to solve. What I have is a vector like: obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) and a matrix T=zeros((6,6)) and what I want in T is a count of all of the transitions in obs, e.g. T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the sequence 3-4 only happens once, etc... I can do it unvectorized like: for o1,o2 in zip(obs[:-1],obs[1:]): T[o1,o2]+=1 which gives the correct answer from above, which is: array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) but I thought there would be a better way. I tried: o1=obs[:-1] o2=obs[1:] T[o1,o2]+=1 but this doesn't give a count, it just yields 1's at the transition points, like: array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 1., 0., 0., 1.], [ 0., 1., 0., 1., 0., 0.], [ 0., 0., 1., 0., 1., 0.], [ 0., 0., 0., 1., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) Is there a clever way to do this? I could write a quick Cython solution, but I wanted to keep this as an all-numpy implementation if I can. >>> >>> histogram2d or its imitation, there was a discussion on histogram2d a >>> short time ago >>> >> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >> obs2 = obs - 1 >> trans = >> np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) >> re = np.array([[ 0., 0., 0., 0., 0., 0.], >>> ... [ 0., 0., 3., 0., 0., 1.], >>> ... [ 0., 3., 0., 1., 0., 0.], >>> ... [ 0., 0., 2., 0., 1., 0.], >>> ... [ 0., 0., 0., 2., 0., 0.], >>> ... [ 0., 0., 0., 0., 1., 0.]]) >> np.all(re == trans) >>> True >>> >> trans >>> array([[0, 0, 0, 0, 0, 0], >>> [0, 0, 3, 0, 0, 1], >>> [0, 3, 0, 1, 0, 0], >>> [0, 0, 2, 0, 1, 0], >>> [0, 0, 0, 2, 0, 0], >>> [0, 0, 0, 0, 1, 0]]) >>> >>> >>> or >>> >> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >> range=[[0,5],[0,5]]) >> re >>> array([[ 0., 0., 0., 0., 0., 0.], >>> [ 0., 0., 3., 0., 0., 1.], >>> [ 0., 3., 0., 1., 0., 0.], >>> [ 0., 0., 2., 0., 1., 0.], >>> [ 0., 0., 0., 2., 0., 0.], >>> [ 0., 0., 0., 0., 1., 0.]]) >> h >>> array([[ 0., 0., 0., 0., 0., 0.], >>> [ 0., 0., 3., 0., 0., 1.], >>> [ 0., 3., 0., 1., 0., 0.], >>> [ 0., 0., 2., 0., 1., 0.], >>> [ 0., 0., 0., 2., 0., 0.], >>> [ 0., 0., 0., 0., 1., 0.]]) >>> >> np.all(re == h) >>> True >> >> There's no way my list method can beat that. But by adding >> >> import psyco >> psyco.full() >> >> I get a total speed up of a factor of 15 when obs is length 1. > > Actually, it is faster: > > histogram: > >>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>> range=[[0,5],[0,5]]) >>> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>> range=[[0,5],[0,5]]) > 100 loops, best of 3: 4.14 ms per loop > > lists: > >>> timeit test(obs3, T3) > 1000 loops, best of 3: 1.32 ms per loop > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > here's a go: import numpy as np import random from itertools import groupby def test1(obs, T): for o1,o2 in zip(obs[:-1],obs[1:]): T[o1][o2] += 1 return T def test2(obs, T): s = zip(obs[:-1], obs[1:]) for idx, g in groupby(sorted(s)): T[idx] = len(list(g)) return T obs = [random.randint(0, 5) for z in range(1)] >>>
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 1:31 PM, wrote: > On Fri, Jun 5, 2009 at 4:27 PM, Keith Goodman wrote: >> On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen wrote: >>> On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote: On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman wrote: > On Fri, Jun 5, 2009 at 12:53 PM, wrote: >> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais wrote: >>> Hello, >>> I have a vectorizing problem that I don't see an obvious way to solve. >>> What >>> I have is a vector like: >>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>> and a matrix >>> T=zeros((6,6)) >>> and what I want in T is a count of all of the transitions in obs, e.g. >>> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the >>> sequence 3-4 only happens once, etc... I can do it unvectorized like: >>> for o1,o2 in zip(obs[:-1],obs[1:]): >>> T[o1,o2]+=1 >>> >>> which gives the correct answer from above, which is: >>> array([[ 0., 0., 0., 0., 0., 0.], >>> [ 0., 0., 3., 0., 0., 1.], >>> [ 0., 3., 0., 1., 0., 0.], >>> [ 0., 0., 2., 0., 1., 0.], >>> [ 0., 0., 0., 2., 0., 0.], >>> [ 0., 0., 0., 0., 1., 0.]]) >>> >>> >>> but I thought there would be a better way. I tried: >>> o1=obs[:-1] >>> o2=obs[1:] >>> T[o1,o2]+=1 >>> but this doesn't give a count, it just yields 1's at the transition >>> points, >>> like: >>> array([[ 0., 0., 0., 0., 0., 0.], >>> [ 0., 0., 1., 0., 0., 1.], >>> [ 0., 1., 0., 1., 0., 0.], >>> [ 0., 0., 1., 0., 1., 0.], >>> [ 0., 0., 0., 1., 0., 0.], >>> [ 0., 0., 0., 0., 1., 0.]]) >>> >>> Is there a clever way to do this? I could write a quick Cython >>> solution, >>> but I wanted to keep this as an all-numpy implementation if I can. >>> >> >> histogram2d or its imitation, there was a discussion on histogram2d a >> short time ago >> > obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) > obs2 = obs - 1 > trans = > np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) > re = np.array([[ 0., 0., 0., 0., 0., 0.], >> ... [ 0., 0., 3., 0., 0., 1.], >> ... [ 0., 3., 0., 1., 0., 0.], >> ... [ 0., 0., 2., 0., 1., 0.], >> ... [ 0., 0., 0., 2., 0., 0.], >> ... [ 0., 0., 0., 0., 1., 0.]]) > np.all(re == trans) >> True >> > trans >> array([[0, 0, 0, 0, 0, 0], >> [0, 0, 3, 0, 0, 1], >> [0, 3, 0, 1, 0, 0], >> [0, 0, 2, 0, 1, 0], >> [0, 0, 0, 2, 0, 0], >> [0, 0, 0, 0, 1, 0]]) >> >> >> or >> > h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, > range=[[0,5],[0,5]]) > re >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 3., 0., 0., 1.], >> [ 0., 3., 0., 1., 0., 0.], >> [ 0., 0., 2., 0., 1., 0.], >> [ 0., 0., 0., 2., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) > h >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 3., 0., 0., 1.], >> [ 0., 3., 0., 1., 0., 0.], >> [ 0., 0., 2., 0., 1., 0.], >> [ 0., 0., 0., 2., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >> > np.all(re == h) >> True > > There's no way my list method can beat that. But by adding > > import psyco > psyco.full() > > I get a total speed up of a factor of 15 when obs is length 1. Actually, it is faster: histogram: >> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >> range=[[0,5],[0,5]]) >> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >> range=[[0,5],[0,5]]) 100 loops, best of 3: 4.14 ms per loop lists: >> timeit test(obs3, T3) 1000 loops, best of 3: 1.32 ms per loop ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> here's a go: >>> >>> import numpy as np >>> import random >>> from itertools import groupby >>> >>> def test1(obs, T): >>> for o1,o2 in zip(obs[:-1],obs[1:]): >>> T[o1][o2] += 1 >>> return T >>> >>> >>> def test2(obs, T): >>> s = zip(obs[:-1], obs[1:]) >>> for idx, g in groupby(sorted(s)): >>> T[idx] = len(list(g)) >>> return T >>> >>> obs = [random.randint(0, 5) for z in range(1)] >>> >>> print test2(obs, np.zeros((6, 6))) >>> print test1(obs, np.zeros((6, 6))) >>> >>> >>> ## >>> >>> In [10]: timeit test1(obs, np.zeros((6, 6))) >>> 100 loops, best of 3: 18.8 ms per loop >>> >>> In
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 1:27 PM, Keith Goodman wrote: > On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen wrote: >> On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote: >>> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman wrote: On Fri, Jun 5, 2009 at 12:53 PM, wrote: > On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais wrote: >> Hello, >> I have a vectorizing problem that I don't see an obvious way to solve. >> What >> I have is a vector like: >> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >> and a matrix >> T=zeros((6,6)) >> and what I want in T is a count of all of the transitions in obs, e.g. >> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the >> sequence 3-4 only happens once, etc... I can do it unvectorized like: >> for o1,o2 in zip(obs[:-1],obs[1:]): >> T[o1,o2]+=1 >> >> which gives the correct answer from above, which is: >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 3., 0., 0., 1.], >> [ 0., 3., 0., 1., 0., 0.], >> [ 0., 0., 2., 0., 1., 0.], >> [ 0., 0., 0., 2., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >> >> >> but I thought there would be a better way. I tried: >> o1=obs[:-1] >> o2=obs[1:] >> T[o1,o2]+=1 >> but this doesn't give a count, it just yields 1's at the transition >> points, >> like: >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 1., 0., 0., 1.], >> [ 0., 1., 0., 1., 0., 0.], >> [ 0., 0., 1., 0., 1., 0.], >> [ 0., 0., 0., 1., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >> >> Is there a clever way to do this? I could write a quick Cython solution, >> but I wanted to keep this as an all-numpy implementation if I can. >> > > histogram2d or its imitation, there was a discussion on histogram2d a > short time ago > obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) obs2 = obs - 1 trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) re = np.array([[ 0., 0., 0., 0., 0., 0.], > ... [ 0., 0., 3., 0., 0., 1.], > ... [ 0., 3., 0., 1., 0., 0.], > ... [ 0., 0., 2., 0., 1., 0.], > ... [ 0., 0., 0., 2., 0., 0.], > ... [ 0., 0., 0., 0., 1., 0.]]) np.all(re == trans) > True > trans > array([[0, 0, 0, 0, 0, 0], > [0, 0, 3, 0, 0, 1], > [0, 3, 0, 1, 0, 0], > [0, 0, 2, 0, 1, 0], > [0, 0, 0, 2, 0, 0], > [0, 0, 0, 0, 1, 0]]) > > > or > h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) re > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) h > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) > np.all(re == h) > True There's no way my list method can beat that. But by adding import psyco psyco.full() I get a total speed up of a factor of 15 when obs is length 1. >>> >>> Actually, it is faster: >>> >>> histogram: >>> > h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) > timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, > range=[[0,5],[0,5]]) >>> 100 loops, best of 3: 4.14 ms per loop >>> >>> lists: >>> > timeit test(obs3, T3) >>> 1000 loops, best of 3: 1.32 ms per loop >>> ___ >>> Numpy-discussion mailing list >>> Numpy-discussion@scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> here's a go: >> >> import numpy as np >> import random >> from itertools import groupby >> >> def test1(obs, T): >> for o1,o2 in zip(obs[:-1],obs[1:]): >> T[o1][o2] += 1 >> return T >> >> >> def test2(obs, T): >> s = zip(obs[:-1], obs[1:]) >> for idx, g in groupby(sorted(s)): >> T[idx] = len(list(g)) >> return T >> >> obs = [random.randint(0, 5) for z in range(1)] >> >> print test2(obs, np.zeros((6, 6))) >> print test1(obs, np.zeros((6, 6))) >> >> >> ## >> >> In [10]: timeit test1(obs, np.zeros((6, 6))) >> 100 loops, best of 3: 18.8 ms per loop >> >> In [11]: timeit test2(obs, np.zeros((6, 6))) >> 100 loops, best of 3: 6.91 ms per loop > > Nice! > > Try adding > > import psyco > psyco.full() > > to test1. Or is that cheating? it is if you're running 64bit. :
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 4:27 PM, Keith Goodman wrote: > On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen wrote: >> On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote: >>> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman wrote: On Fri, Jun 5, 2009 at 12:53 PM, wrote: > On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais wrote: >> Hello, >> I have a vectorizing problem that I don't see an obvious way to solve. >> What >> I have is a vector like: >> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >> and a matrix >> T=zeros((6,6)) >> and what I want in T is a count of all of the transitions in obs, e.g. >> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the >> sequence 3-4 only happens once, etc... I can do it unvectorized like: >> for o1,o2 in zip(obs[:-1],obs[1:]): >> T[o1,o2]+=1 >> >> which gives the correct answer from above, which is: >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 3., 0., 0., 1.], >> [ 0., 3., 0., 1., 0., 0.], >> [ 0., 0., 2., 0., 1., 0.], >> [ 0., 0., 0., 2., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >> >> >> but I thought there would be a better way. I tried: >> o1=obs[:-1] >> o2=obs[1:] >> T[o1,o2]+=1 >> but this doesn't give a count, it just yields 1's at the transition >> points, >> like: >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 1., 0., 0., 1.], >> [ 0., 1., 0., 1., 0., 0.], >> [ 0., 0., 1., 0., 1., 0.], >> [ 0., 0., 0., 1., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >> >> Is there a clever way to do this? I could write a quick Cython solution, >> but I wanted to keep this as an all-numpy implementation if I can. >> > > histogram2d or its imitation, there was a discussion on histogram2d a > short time ago > obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) obs2 = obs - 1 trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) re = np.array([[ 0., 0., 0., 0., 0., 0.], > ... [ 0., 0., 3., 0., 0., 1.], > ... [ 0., 3., 0., 1., 0., 0.], > ... [ 0., 0., 2., 0., 1., 0.], > ... [ 0., 0., 0., 2., 0., 0.], > ... [ 0., 0., 0., 0., 1., 0.]]) np.all(re == trans) > True > trans > array([[0, 0, 0, 0, 0, 0], > [0, 0, 3, 0, 0, 1], > [0, 3, 0, 1, 0, 0], > [0, 0, 2, 0, 1, 0], > [0, 0, 0, 2, 0, 0], > [0, 0, 0, 0, 1, 0]]) > > > or > h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) re > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) h > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) > np.all(re == h) > True There's no way my list method can beat that. But by adding import psyco psyco.full() I get a total speed up of a factor of 15 when obs is length 1. >>> >>> Actually, it is faster: >>> >>> histogram: >>> > h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) > timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, > range=[[0,5],[0,5]]) >>> 100 loops, best of 3: 4.14 ms per loop >>> >>> lists: >>> > timeit test(obs3, T3) >>> 1000 loops, best of 3: 1.32 ms per loop >>> ___ >>> Numpy-discussion mailing list >>> Numpy-discussion@scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> here's a go: >> >> import numpy as np >> import random >> from itertools import groupby >> >> def test1(obs, T): >> for o1,o2 in zip(obs[:-1],obs[1:]): >> T[o1][o2] += 1 >> return T >> >> >> def test2(obs, T): >> s = zip(obs[:-1], obs[1:]) >> for idx, g in groupby(sorted(s)): >> T[idx] = len(list(g)) >> return T >> >> obs = [random.randint(0, 5) for z in range(1)] >> >> print test2(obs, np.zeros((6, 6))) >> print test1(obs, np.zeros((6, 6))) >> >> >> ## >> >> In [10]: timeit test1(obs, np.zeros((6, 6))) >> 100 loops, best of 3: 18.8 ms per loop >> >> In [11]: timeit test2(obs, np.zeros((6, 6))) >> 100 loops, best of 3: 6.91 ms per loop > > Nice! > > Try adding > > import psyco > psyco.full() > > to test1. Or is that cheating? > __
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen wrote: > On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote: >> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman wrote: >>> On Fri, Jun 5, 2009 at 12:53 PM, wrote: On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais wrote: > Hello, > I have a vectorizing problem that I don't see an obvious way to solve. > What > I have is a vector like: > obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) > and a matrix > T=zeros((6,6)) > and what I want in T is a count of all of the transitions in obs, e.g. > T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the > sequence 3-4 only happens once, etc... I can do it unvectorized like: > for o1,o2 in zip(obs[:-1],obs[1:]): > T[o1,o2]+=1 > > which gives the correct answer from above, which is: > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) > > > but I thought there would be a better way. I tried: > o1=obs[:-1] > o2=obs[1:] > T[o1,o2]+=1 > but this doesn't give a count, it just yields 1's at the transition > points, > like: > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 1., 0., 0., 1.], > [ 0., 1., 0., 1., 0., 0.], > [ 0., 0., 1., 0., 1., 0.], > [ 0., 0., 0., 1., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) > > Is there a clever way to do this? I could write a quick Cython solution, > but I wanted to keep this as an all-numpy implementation if I can. > histogram2d or its imitation, there was a discussion on histogram2d a short time ago >>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>> obs2 = obs - 1 >>> trans = >>> np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) >>> re = np.array([[ 0., 0., 0., 0., 0., 0.], ... [ 0., 0., 3., 0., 0., 1.], ... [ 0., 3., 0., 1., 0., 0.], ... [ 0., 0., 2., 0., 1., 0.], ... [ 0., 0., 0., 2., 0., 0.], ... [ 0., 0., 0., 0., 1., 0.]]) >>> np.all(re == trans) True >>> trans array([[0, 0, 0, 0, 0, 0], [0, 0, 3, 0, 0, 1], [0, 3, 0, 1, 0, 0], [0, 0, 2, 0, 1, 0], [0, 0, 0, 2, 0, 0], [0, 0, 0, 0, 1, 0]]) or >>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>> range=[[0,5],[0,5]]) >>> re array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) >>> h array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) >>> np.all(re == h) True >>> >>> There's no way my list method can beat that. But by adding >>> >>> import psyco >>> psyco.full() >>> >>> I get a total speed up of a factor of 15 when obs is length 1. >> >> Actually, it is faster: >> >> histogram: >> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) >> 100 loops, best of 3: 4.14 ms per loop >> >> lists: >> timeit test(obs3, T3) >> 1000 loops, best of 3: 1.32 ms per loop >> ___ >> Numpy-discussion mailing list >> Numpy-discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > here's a go: > > import numpy as np > import random > from itertools import groupby > > def test1(obs, T): > for o1,o2 in zip(obs[:-1],obs[1:]): > T[o1][o2] += 1 > return T > > > def test2(obs, T): > s = zip(obs[:-1], obs[1:]) > for idx, g in groupby(sorted(s)): > T[idx] = len(list(g)) > return T > > obs = [random.randint(0, 5) for z in range(1)] > > print test2(obs, np.zeros((6, 6))) > print test1(obs, np.zeros((6, 6))) > > > ## > > In [10]: timeit test1(obs, np.zeros((6, 6))) > 100 loops, best of 3: 18.8 ms per loop > > In [11]: timeit test2(obs, np.zeros((6, 6))) > 100 loops, best of 3: 6.91 ms per loop Nice! Try adding import psyco psyco.full() to test1. Or is that cheating? ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman wrote: > On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman wrote: >> On Fri, Jun 5, 2009 at 12:53 PM, wrote: >>> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais wrote: Hello, I have a vectorizing problem that I don't see an obvious way to solve. What I have is a vector like: obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) and a matrix T=zeros((6,6)) and what I want in T is a count of all of the transitions in obs, e.g. T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the sequence 3-4 only happens once, etc... I can do it unvectorized like: for o1,o2 in zip(obs[:-1],obs[1:]): T[o1,o2]+=1 which gives the correct answer from above, which is: array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) but I thought there would be a better way. I tried: o1=obs[:-1] o2=obs[1:] T[o1,o2]+=1 but this doesn't give a count, it just yields 1's at the transition points, like: array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 1., 0., 0., 1.], [ 0., 1., 0., 1., 0., 0.], [ 0., 0., 1., 0., 1., 0.], [ 0., 0., 0., 1., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) Is there a clever way to do this? I could write a quick Cython solution, but I wanted to keep this as an all-numpy implementation if I can. >>> >>> histogram2d or its imitation, there was a discussion on histogram2d a >>> short time ago >>> >> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >> obs2 = obs - 1 >> trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) >> re = np.array([[ 0., 0., 0., 0., 0., 0.], >>> ... [ 0., 0., 3., 0., 0., 1.], >>> ... [ 0., 3., 0., 1., 0., 0.], >>> ... [ 0., 0., 2., 0., 1., 0.], >>> ... [ 0., 0., 0., 2., 0., 0.], >>> ... [ 0., 0., 0., 0., 1., 0.]]) >> np.all(re == trans) >>> True >>> >> trans >>> array([[0, 0, 0, 0, 0, 0], >>> [0, 0, 3, 0, 0, 1], >>> [0, 3, 0, 1, 0, 0], >>> [0, 0, 2, 0, 1, 0], >>> [0, 0, 0, 2, 0, 0], >>> [0, 0, 0, 0, 1, 0]]) >>> >>> >>> or >>> >> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >> range=[[0,5],[0,5]]) >> re >>> array([[ 0., 0., 0., 0., 0., 0.], >>> [ 0., 0., 3., 0., 0., 1.], >>> [ 0., 3., 0., 1., 0., 0.], >>> [ 0., 0., 2., 0., 1., 0.], >>> [ 0., 0., 0., 2., 0., 0.], >>> [ 0., 0., 0., 0., 1., 0.]]) >> h >>> array([[ 0., 0., 0., 0., 0., 0.], >>> [ 0., 0., 3., 0., 0., 1.], >>> [ 0., 3., 0., 1., 0., 0.], >>> [ 0., 0., 2., 0., 1., 0.], >>> [ 0., 0., 0., 2., 0., 0.], >>> [ 0., 0., 0., 0., 1., 0.]]) >>> >> np.all(re == h) >>> True >> >> There's no way my list method can beat that. But by adding >> >> import psyco >> psyco.full() >> >> I get a total speed up of a factor of 15 when obs is length 1. > > Actually, it is faster: > > histogram: > >>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) >>> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>> range=[[0,5],[0,5]]) > 100 loops, best of 3: 4.14 ms per loop > > lists: > >>> timeit test(obs3, T3) > 1000 loops, best of 3: 1.32 ms per loop > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > here's a go: import numpy as np import random from itertools import groupby def test1(obs, T): for o1,o2 in zip(obs[:-1],obs[1:]): T[o1][o2] += 1 return T def test2(obs, T): s = zip(obs[:-1], obs[1:]) for idx, g in groupby(sorted(s)): T[idx] = len(list(g)) return T obs = [random.randint(0, 5) for z in range(1)] print test2(obs, np.zeros((6, 6))) print test1(obs, np.zeros((6, 6))) ## In [10]: timeit test1(obs, np.zeros((6, 6))) 100 loops, best of 3: 18.8 ms per loop In [11]: timeit test2(obs, np.zeros((6, 6))) 100 loops, best of 3: 6.91 ms per loop ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman wrote: > On Fri, Jun 5, 2009 at 12:53 PM, wrote: >> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais wrote: >>> Hello, >>> I have a vectorizing problem that I don't see an obvious way to solve. What >>> I have is a vector like: >>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>> and a matrix >>> T=zeros((6,6)) >>> and what I want in T is a count of all of the transitions in obs, e.g. >>> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the >>> sequence 3-4 only happens once, etc... I can do it unvectorized like: >>> for o1,o2 in zip(obs[:-1],obs[1:]): >>> T[o1,o2]+=1 >>> >>> which gives the correct answer from above, which is: >>> array([[ 0., 0., 0., 0., 0., 0.], >>> [ 0., 0., 3., 0., 0., 1.], >>> [ 0., 3., 0., 1., 0., 0.], >>> [ 0., 0., 2., 0., 1., 0.], >>> [ 0., 0., 0., 2., 0., 0.], >>> [ 0., 0., 0., 0., 1., 0.]]) >>> >>> >>> but I thought there would be a better way. I tried: >>> o1=obs[:-1] >>> o2=obs[1:] >>> T[o1,o2]+=1 >>> but this doesn't give a count, it just yields 1's at the transition points, >>> like: >>> array([[ 0., 0., 0., 0., 0., 0.], >>> [ 0., 0., 1., 0., 0., 1.], >>> [ 0., 1., 0., 1., 0., 0.], >>> [ 0., 0., 1., 0., 1., 0.], >>> [ 0., 0., 0., 1., 0., 0.], >>> [ 0., 0., 0., 0., 1., 0.]]) >>> >>> Is there a clever way to do this? I could write a quick Cython solution, >>> but I wanted to keep this as an all-numpy implementation if I can. >>> >> >> histogram2d or its imitation, there was a discussion on histogram2d a >> short time ago >> > obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) > obs2 = obs - 1 > trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) > re = np.array([[ 0., 0., 0., 0., 0., 0.], >> ... [ 0., 0., 3., 0., 0., 1.], >> ... [ 0., 3., 0., 1., 0., 0.], >> ... [ 0., 0., 2., 0., 1., 0.], >> ... [ 0., 0., 0., 2., 0., 0.], >> ... [ 0., 0., 0., 0., 1., 0.]]) > np.all(re == trans) >> True >> > trans >> array([[0, 0, 0, 0, 0, 0], >> [0, 0, 3, 0, 0, 1], >> [0, 3, 0, 1, 0, 0], >> [0, 0, 2, 0, 1, 0], >> [0, 0, 0, 2, 0, 0], >> [0, 0, 0, 0, 1, 0]]) >> >> >> or >> > h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) > re >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 3., 0., 0., 1.], >> [ 0., 3., 0., 1., 0., 0.], >> [ 0., 0., 2., 0., 1., 0.], >> [ 0., 0., 0., 2., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) > h >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 3., 0., 0., 1.], >> [ 0., 3., 0., 1., 0., 0.], >> [ 0., 0., 2., 0., 1., 0.], >> [ 0., 0., 0., 2., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >> > np.all(re == h) >> True > > There's no way my list method can beat that. But by adding > > import psyco > psyco.full() > > I get a total speed up of a factor of 15 when obs is length 1. Actually, it is faster: histogram: >> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) >> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >> range=[[0,5],[0,5]]) 100 loops, best of 3: 4.14 ms per loop lists: >> timeit test(obs3, T3) 1000 loops, best of 3: 1.32 ms per loop ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 3:53 PM, wrote: > On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais wrote: >> Hello, >> I have a vectorizing problem that I don't see an obvious way to solve. What >> I have is a vector like: >> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >> and a matrix >> T=zeros((6,6)) >> and what I want in T is a count of all of the transitions in obs, e.g. >> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the >> sequence 3-4 only happens once, etc... I can do it unvectorized like: >> for o1,o2 in zip(obs[:-1],obs[1:]): >> T[o1,o2]+=1 >> >> which gives the correct answer from above, which is: >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 3., 0., 0., 1.], >> [ 0., 3., 0., 1., 0., 0.], >> [ 0., 0., 2., 0., 1., 0.], >> [ 0., 0., 0., 2., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >> >> >> but I thought there would be a better way. I tried: >> o1=obs[:-1] >> o2=obs[1:] >> T[o1,o2]+=1 >> but this doesn't give a count, it just yields 1's at the transition points, >> like: >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 1., 0., 0., 1.], >> [ 0., 1., 0., 1., 0., 0.], >> [ 0., 0., 1., 0., 1., 0.], >> [ 0., 0., 0., 1., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >> >> Is there a clever way to do this? I could write a quick Cython solution, >> but I wanted to keep this as an all-numpy implementation if I can. >> > > histogram2d or its imitation, there was a discussion on histogram2d a > short time ago > obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) obs2 = obs - 1 trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) I don't think this is correct, there are problems for the first and last position if (1,1) and (5,5) are possible. I don't remember how I did it before. The histogram2d version still looks correct for this case. histogram2d uses bincount under the hood in a similar way. Josef re = np.array([[ 0., 0., 0., 0., 0., 0.], > ... [ 0., 0., 3., 0., 0., 1.], > ... [ 0., 3., 0., 1., 0., 0.], > ... [ 0., 0., 2., 0., 1., 0.], > ... [ 0., 0., 0., 2., 0., 0.], > ... [ 0., 0., 0., 0., 1., 0.]]) np.all(re == trans) > True > trans > array([[0, 0, 0, 0, 0, 0], > [0, 0, 3, 0, 0, 1], > [0, 3, 0, 1, 0, 0], > [0, 0, 2, 0, 1, 0], > [0, 0, 0, 2, 0, 0], > [0, 0, 0, 0, 1, 0]]) > > > or > h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) re > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) h > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) > np.all(re == h) > True > ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 12:53 PM, wrote: > On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais wrote: >> Hello, >> I have a vectorizing problem that I don't see an obvious way to solve. What >> I have is a vector like: >> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >> and a matrix >> T=zeros((6,6)) >> and what I want in T is a count of all of the transitions in obs, e.g. >> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the >> sequence 3-4 only happens once, etc... I can do it unvectorized like: >> for o1,o2 in zip(obs[:-1],obs[1:]): >> T[o1,o2]+=1 >> >> which gives the correct answer from above, which is: >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 3., 0., 0., 1.], >> [ 0., 3., 0., 1., 0., 0.], >> [ 0., 0., 2., 0., 1., 0.], >> [ 0., 0., 0., 2., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >> >> >> but I thought there would be a better way. I tried: >> o1=obs[:-1] >> o2=obs[1:] >> T[o1,o2]+=1 >> but this doesn't give a count, it just yields 1's at the transition points, >> like: >> array([[ 0., 0., 0., 0., 0., 0.], >> [ 0., 0., 1., 0., 0., 1.], >> [ 0., 1., 0., 1., 0., 0.], >> [ 0., 0., 1., 0., 1., 0.], >> [ 0., 0., 0., 1., 0., 0.], >> [ 0., 0., 0., 0., 1., 0.]]) >> >> Is there a clever way to do this? I could write a quick Cython solution, >> but I wanted to keep this as an all-numpy implementation if I can. >> > > histogram2d or its imitation, there was a discussion on histogram2d a > short time ago > obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) obs2 = obs - 1 trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) re = np.array([[ 0., 0., 0., 0., 0., 0.], > ... [ 0., 0., 3., 0., 0., 1.], > ... [ 0., 3., 0., 1., 0., 0.], > ... [ 0., 0., 2., 0., 1., 0.], > ... [ 0., 0., 0., 2., 0., 0.], > ... [ 0., 0., 0., 0., 1., 0.]]) np.all(re == trans) > True > trans > array([[0, 0, 0, 0, 0, 0], > [0, 0, 3, 0, 0, 1], > [0, 3, 0, 1, 0, 0], > [0, 0, 2, 0, 1, 0], > [0, 0, 0, 2, 0, 0], > [0, 0, 0, 0, 1, 0]]) > > > or > h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) re > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) h > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) > np.all(re == h) > True There's no way my list method can beat that. But by adding import psyco psyco.full() I get a total speed up of a factor of 15 when obs is length 1. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Maturing the Matrix class in NumPy
Hi Alan 2009/6/5 Alan G Isaac : > You should take it very seriously when someone > reports to you that the matrix object is a crucial to them, > e.g., as a teaching tool. Even if you do not find > personally persuasive an example like > http://mail.scipy.org/pipermail/numpy-discussion/2009-June/043001.html You make a good point and the example you gave earlier shows how the Matrix object can be used to write expressions that correspond clearly to those in a textbook. Christopher is right that most of the developers don't use the Matrix class much, but that is no excuse to neglect it as we are currently doing. If the Matrix class is to remain, we need to take the steps necessary to integrate it into NumPy properly. To get going we'll need a list of changes required (i.e. "in an ideal world, how would matrices work?"). There should be a set protocol for all numpy functions that guarantees compatibility with ndarrays, matrices and other derived classes. This also ties in with a suggestion by Darren Dale on ufuncs that hasn't been getting the attention it deserves. Indexing issues need to be cleared up, with a chapter on matrix use written for the guide. Being one of the most vocal proponents of the Matrix class, would you be prepared to develop your Matrix Proposal at http://scipy.org/NewMatrixSpec further? We need to get to a point where we can assign tasks, and start writing code. The keyphrase here should be "rough consensus" to avoid the bike shedding we've run into in the past. Thanks for following this through, Regards Stéfan ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais wrote: > Hello, > I have a vectorizing problem that I don't see an obvious way to solve. What > I have is a vector like: > obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) > and a matrix > T=zeros((6,6)) > and what I want in T is a count of all of the transitions in obs, e.g. > T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the > sequence 3-4 only happens once, etc... I can do it unvectorized like: > for o1,o2 in zip(obs[:-1],obs[1:]): > T[o1,o2]+=1 > > which gives the correct answer from above, which is: > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) > > > but I thought there would be a better way. I tried: > o1=obs[:-1] > o2=obs[1:] > T[o1,o2]+=1 > but this doesn't give a count, it just yields 1's at the transition points, > like: > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 1., 0., 0., 1.], > [ 0., 1., 0., 1., 0., 0.], > [ 0., 0., 1., 0., 1., 0.], > [ 0., 0., 0., 1., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) > > Is there a clever way to do this? I could write a quick Cython solution, > but I wanted to keep this as an all-numpy implementation if I can. > histogram2d or its imitation, there was a discussion on histogram2d a short time ago >>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>> obs2 = obs - 1 >>> trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) >>> re = np.array([[ 0., 0., 0., 0., 0., 0.], ... [ 0., 0., 3., 0., 0., 1.], ... [ 0., 3., 0., 1., 0., 0.], ... [ 0., 0., 2., 0., 1., 0.], ... [ 0., 0., 0., 2., 0., 0.], ... [ 0., 0., 0., 0., 1., 0.]]) >>> np.all(re == trans) True >>> trans array([[0, 0, 0, 0, 0, 0], [0, 0, 3, 0, 0, 1], [0, 3, 0, 1, 0, 0], [0, 0, 2, 0, 1, 0], [0, 0, 0, 2, 0, 0], [0, 0, 0, 0, 1, 0]]) or >>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) >>> re array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) >>> h array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) >>> np.all(re == h) True ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] vectorizing
On Fri, Jun 5, 2009 at 11:07 AM, Brian Blais wrote: > Hello, > I have a vectorizing problem that I don't see an obvious way to solve. What > I have is a vector like: > obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) > and a matrix > T=zeros((6,6)) > and what I want in T is a count of all of the transitions in obs, e.g. > T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the > sequence 3-4 only happens once, etc... I can do it unvectorized like: > for o1,o2 in zip(obs[:-1],obs[1:]): > T[o1,o2]+=1 > > which gives the correct answer from above, which is: > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 3., 0., 0., 1.], > [ 0., 3., 0., 1., 0., 0.], > [ 0., 0., 2., 0., 1., 0.], > [ 0., 0., 0., 2., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) > > > but I thought there would be a better way. I tried: > o1=obs[:-1] > o2=obs[1:] > T[o1,o2]+=1 > but this doesn't give a count, it just yields 1's at the transition points, > like: > array([[ 0., 0., 0., 0., 0., 0.], > [ 0., 0., 1., 0., 0., 1.], > [ 0., 1., 0., 1., 0., 0.], > [ 0., 0., 1., 0., 1., 0.], > [ 0., 0., 0., 1., 0., 0.], > [ 0., 0., 0., 0., 1., 0.]]) > > Is there a clever way to do this? I could write a quick Cython solution, > but I wanted to keep this as an all-numpy implementation if I can. It's a little faster (8.5% for me when obs is length 1) if you do T = np.zeros((6,6), dtype=np.int) But it more than 5 times faster if you use lists for T and obs. You're just storing information here, so there is no reason to pay for the overhead of arrays. import random import numpy as np T = [[0,0,0,0,0,0], [0,0,0,0,0,0], [0,0,0,0,0,0], [0,0,0,0,0,0], [0,0,0,0,0,0], [0,0,0,0,0,0]] obs = [random.randint(0, 5) for z in range(1)] def test(obs, T): for o1,o2 in zip(obs[:-1],obs[1:]): T[o1][o2] += 1 return T ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] vectorizing
Hello, I have a vectorizing problem that I don't see an obvious way to solve. What I have is a vector like: obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) and a matrix T=zeros((6,6)) and what I want in T is a count of all of the transitions in obs, e.g. T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the sequence 3-4 only happens once, etc... I can do it unvectorized like: for o1,o2 in zip(obs[:-1],obs[1:]): T[o1,o2]+=1 which gives the correct answer from above, which is: array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 3., 0., 0., 1.], [ 0., 3., 0., 1., 0., 0.], [ 0., 0., 2., 0., 1., 0.], [ 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) but I thought there would be a better way. I tried: o1=obs[:-1] o2=obs[1:] T[o1,o2]+=1 but this doesn't give a count, it just yields 1's at the transition points, like: array([[ 0., 0., 0., 0., 0., 0.], [ 0., 0., 1., 0., 0., 1.], [ 0., 1., 0., 1., 0., 0.], [ 0., 0., 1., 0., 1., 0.], [ 0., 0., 0., 1., 0., 0.], [ 0., 0., 0., 0., 1., 0.]]) Is there a clever way to do this? I could write a quick Cython solution, but I wanted to keep this as an all-numpy implementation if I can. thanks, Brian Blais -- Brian Blais bbl...@bryant.edu http://web.bryant.edu/~bblais ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix default to column vector?
As someone who is very used to thinking in terms of matrices and who just went through the adjustment of translating matlab-like code to numpy, I found the current matrix module to be confusing. It's poor integration with the rest of numpy/scipy (in particular, scipy.optimize.fmin_cg) made it more difficult to use than it was worth. I'd rather have "matrix" and/or "matrix multiplication" sections of the documentation explain how to do typical, basic matrix operations with nparray, dot, T, and arr[None,:]. I think a matrix class would still be worthwhile for findability, but it should simply serve as documentation for how to do matrix stuff with nparray. Cheers, Jason -- Jason Rennie Research Scientist, ITA Software 617-714-2645 http://www.itasoftware.com/ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] performance matrix multiplication vs. matlab
2009/6/5 David Cournapeau : > Eric Firing wrote: >> >> David, >> >> The eigen web site indicates that eigen achieves high performance >> without all the compilation difficulty of atlas. Does eigen have enough >> functionality to replace atlas in numpy? > > No, eigen does not provide a (complete) BLAS/LAPACK interface. I don't > know if that's even a goal of eigen (it started as a project for KDE, to > support high performance core computations for things like spreadsheet > and co). > > But even then, it would be a huge undertaking. For all its flaws, LAPACK > is old, tested code, with a very stable language (F77). Eigen is: > - not mature. > - heavily expression-template-based C++, meaning compilation takes > ages + esoteric, impossible to decypher compilation errors. We have > enough build problems already :) > - SSE dependency harcoded, since it is setup at build time. That's > going backward IMHO - I would rather see a numpy/scipy which can load > the optimized code at runtime. I would add that it relies on C++ compiler extensions (the restrict keyword) as does blitz. You unfortunately can't expect every compiler to support it unless the C++ committee finally adds it to the standard. Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] performance matrix multiplication vs. matlab
Eric Firing wrote: > > David, > > The eigen web site indicates that eigen achieves high performance > without all the compilation difficulty of atlas. Does eigen have enough > functionality to replace atlas in numpy? No, eigen does not provide a (complete) BLAS/LAPACK interface. I don't know if that's even a goal of eigen (it started as a project for KDE, to support high performance core computations for things like spreadsheet and co). But even then, it would be a huge undertaking. For all its flaws, LAPACK is old, tested code, with a very stable language (F77). Eigen is: - not mature. - heavily expression-template-based C++, meaning compilation takes ages + esoteric, impossible to decypher compilation errors. We have enough build problems already :) - SSE dependency harcoded, since it is setup at build time. That's going backward IMHO - I would rather see a numpy/scipy which can load the optimized code at runtime. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix default to column vector?
On Fri, Jun 5, 2009 at 1:33 PM, Geoffrey Ely wrote: > On Jun 5, 2009, at 10:18 AM, josef.p...@gmail.com wrote: >> I'm only using arrays for consistency, but my econometrics code is >> filled with arr[np.newaxis,:] . > > > arr[None,:] is a lot cleaner in my opinion, especially when using > "numpy" as the namespace. It took me a long time to figure out that None and np.newaxis do the same thing. I find newaxis more descriptive and mostly stick to it. When I started looking at numpy, I was always wondering what these "None"s were doing in the code. just one more useful trick to preserve dimension, using slices instead of index avoids arr[None,:], but requires more thinking. Josef >>> X matrix([[1, 0], [1, 1]]) >>> Y array([[1, 0], [1, 1]]) >>> X[0,:].shape == Y[0:1,:].shape True >>> X[0,:] == Y[0:1,:] matrix([[ True, True]], dtype=bool) > -Geoff ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix default to column vector?
On Jun 5, 2009, at 10:18 AM, josef.p...@gmail.com wrote: > I'm only using arrays for consistency, but my econometrics code is > filled with arr[np.newaxis,:] . arr[None,:] is a lot cleaner in my opinion, especially when using "numpy" as the namespace. -Geoff ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix default to column vector?
On 6/5/2009 11:38 AM Olivier Verdier apparently wrote: > I think matrices can be pretty tricky when used for > teaching. For instance, you have to explain that all the > operators work component-wise, except the multiplication! > Another caveat is that since matrices are always 2x2, the > "scalar product" of two column vectors computed as " x.T > * y" will not be a scalar, but a 2x2 matrix. There is also > the fact that you must cast all your vectors to column/raw > matrices (as in matlab). For all these reasons, I prefer > to use arrays and dot for teaching, and I have never had > any complaints. I do not understand this "argument". You should take it very seriously when someone reports to you that the matrix object is a crucial to them, e.g., as a teaching tool. Even if you do not find personally persuasive an example like http://mail.scipy.org/pipermail/numpy-discussion/2009-June/043001.html I have told you: this is important for my students. Reporting that your students do not complain about using arrays instead of matrices does not change this one bit. Student backgrounds differ by domain of application. In economics, matrices are in *very* wide use, and multidimensional arrays get almost no use. Textbooks in econometrics (a huge and important field, even outside of economics) are full of proofs using matrix algebra. A close match to what the students see is crucial. When working with multiplication or exponentiation, matrices do what they expect, and 2d arrays do not. One more point. As Python users we get used to installing a package here and a package there to add functionality. But this is not how most people looking for a matrix language see the world. Removing the matrix object from NumPy will raise the barrier to adoption by social scientists, and there should be a strongly persuasive reason before taking such a step. Separately from all that, does anyone doubt that there is code that depends on the matrix object? The core objection to a past proposal for useful change was that it could break extant code. I would hope that nobody who took that position would subsequently propose removing the matrix object altogether. Cheers, Alan Isaac PS If x and y are "column vectors" (i.e., matrices), then x.T * y *should* be a 1×1 matrix. Since the * operator is doing matrix multiplication, this is the correct result, not an anomaly. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] performance matrix multiplication vs. matlab
David Cournapeau wrote: > > It really depends on the CPU, compiler, how atlas was compiled, etc... > it can be slightly faster to 10 times faster (if you use a very poorly > optimized ATLAS). > > For some recent benchmarks: > > http://eigen.tuxfamily.org/index.php?title=Benchmark > David, The eigen web site indicates that eigen achieves high performance without all the compilation difficulty of atlas. Does eigen have enough functionality to replace atlas in numpy? Presumably it would need C compatibility wrappers to emulate the blas functions. Would that kill its performance? Or be very difficult? (I'm asking from curiosity combined with complete ignorance. Until yesterday I had never even heard of eigen.) Eric > cheers, > > David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix default to column vector?
On Fri, Jun 5, 2009 at 11:38 AM, Olivier Verdier wrote: > I agree. It would be a good idea to have matrices out of numpy as a > standalone package. > Indeed, having matrices in the numpy core comes at a pedagogical cost. > Newcomers (as I once was) do not know which to use. Matrix or array? It > turns out that the vast majority of numpy/scipy modules use arrays, so > arrays is the preferred way to go. > It would thus be clearer to have arrays in numpy and matrices available as > an external package. > Besides, I think matrices can be pretty tricky when used for teaching. For > instance, you have to explain that all the operators work component-wise, > except the multiplication! Another caveat is that since matrices are always > 2x2, the "scalar product" of two column vectors computed as " x.T * y" will > not be a scalar, but a 2x2 matrix. There is also the fact that you must cast > all your vectors to column/raw matrices (as in matlab). For all these > reasons, I prefer to use arrays and dot for teaching, and I have never had > any complaints. For anyone with an econometrics background in gauss or matlab, numpy arrays have a huge adjustment cost. I'm only using arrays for consistency, but my econometrics code is filled with arr[np.newaxis,:] . I wouldn't recommend my friends switching from gauss to numpy/scipy when they want to write or develop some econometrics algorithm. gauss feels like copying the formula from the notes or a book. With numpy arrays I always have to recover one dimension, and I have to verify the shape of any array more often. In econometrics the default vector is often a 2d vector so x.T*x and x*x.T have very different meanings. python/numpy/scipy have a lot of other advantages compared to gauss or matlab, but it's not the linear algebra syntax. So, I don't see any sense in removing matrices, you can just ignore them and tell your students to do so. It increases the initial switching cost, even if users, that get more into it, then switch to arrays. Josef (BTW: I think scipy.stats will soon become matrix friendly) >>> X = np.matrix([[1,0],[1,1]]) >>> X.mean() 0.75 >>> X.mean(0) matrix([[ 1. , 0.5]]) >>> X-X.mean(0) matrix([[ 0. , -0.5], [ 0. , 0.5]]) >>> Y = np.array([[1,0],[1,1]]) >>> Y.mean(0).shape (2,) >>> Y - Y.mean(0)[np.newaxis,:] array([[ 0. , -0.5], [ 0. , 0.5]]) >>> np.matrix([[1,0],[1,1]])**2 matrix([[1, 0], [2, 1]]) >>> np.array([[1,0],[1,1]])**2 array([[1, 0], [1, 1]]) >>> np.matrix([[1,0],[1,1]])**(-1) matrix([[ 1., 0.], [-1., 1.]]) >>> np.array([[1,0],[1,1]])**(-1) array([[ 1, -2147483648], [ 1, 1]]) >>> x = np.matrix([[1],[0]]) >>> x.T*x matrix([[1]]) >>> (x.T*x).shape (1, 1) >>> (x*x.T) matrix([[1, 0], [0, 0]]) >>> (x*x.T).shape (2, 2) >>> y = np.array([1,0]) >>> np.dot(y,y) 1 >>> np.dot(y,y.T) 1 >>> np.dot(y[:,np.newaxis], y[np.newaxis,:]) array([[1, 0], [0, 0]]) ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Import fails of scipy.factorial wheninstalled as a zipped egg
> I don't think numpy/scipy are zip safe, the numpy packageloader uses os.path to find files. Evidently! :-) But the strange thing is that all this worked fine with Scipy 0.6.0 - it's only since 0.7.0 was released that this started going wrong. Sal Disclaimer CALYON UK: This email does not create a legal relationship between any member of the Cr=E9dit Agricole group and the recipient or constitute investment advice. The content of this email should not be copied or disclosed (in whole or part) to any other person. It may contain information which is confidential, privileged or otherwise protected from disclosure. If you are not the intended recipient, you should notify us and delete it from your system. Emails may be monitored, are not secure and may be amended, destroyed or contain viruses and in communicating with us such conditions are accepted. Any content which does not relate to business matters is not endorsed by us. Calyon is authorised by the Comit=e9 des Etablissements de Cr=e9dit et des Entreprises d'Investissement (CECEI) and supervised by the Commission Bancaire in France and subject to limited regulation by the Financial Services Authority. Details about the extent of our regulation by the Financial Services Authority are available from us on request. Calyon is incorporated in France with limited liability and registered in England & Wales. Registration number: FC008194. Registered office: Broadwalk House, 5 Appold Street, London, EC2A 2DA. Disclaimer CALYON France: This message and/or any attachments is intended for the sole use of its addressee. If you are not the addressee, please immediately notify the sender and then destroy the message. As this message and/or any attachments may have been altered without our knowledge, its content is not legally binding on CALYON Crédit Agricole CIB. All rights reserved. Ce message et ses pièces jointes est destiné à l'usage exclusif de son destinataire. Si vous recevez ce message par erreur, merci d'en aviser immédiatement l'expéditeur et de le détruire ensuite. Le présent message pouvant être altéré à notre insu, CALYON Crédit Agricole CIB ne peut pas être engagé par son contenu. Tous droits réservés. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] matrix default to column vector?
I agree. It would be a good idea to have matrices out of numpy as a standalone package. Indeed, having matrices in the numpy core comes at a pedagogical cost. Newcomers (as I once was) do not know which to use. Matrix or array? It turns out that the vast majority of numpy/scipy modules use arrays, so arrays is the preferred way to go. It would thus be clearer to have arrays in numpy and matrices available as an external package. Besides, I think matrices can be pretty tricky when used for teaching. For instance, you have to explain that all the operators work component-wise, except the multiplication! Another caveat is that since matrices are always 2x2, the "scalar product" of two column vectors computed as " x.T * y" will not be a scalar, but a 2x2 matrix. There is also the fact that you must cast all your vectors to column/raw matrices (as in matlab). For all these reasons, I prefer to use arrays and dot for teaching, and I have never had any complaints. == Olivier 2009/6/4 Tommy Grav > > On Jun 4, 2009, at 5:41 PM, Alan G Isaac wrote: > > On 6/4/2009 5:27 PM Tommy Grav apparently wrote: > >> Or the core development team split the matrices out of numpy and > >> make it > >> as separate package that the people that use them could pick up and > >> run with. > > > > > > This too would be a mistake, I believe. > > But it depends on whether a goal is to > > have more people use NumPy. I believe > > the community will gain from growth. > > > > In sum, my argument is this: > > Keeping a matrix object in NumPy has substantial > > benefits in encouraging growth of the NumPy > > community, and as far as I can tell, it is > > imposing few costs. Therefore I think there is > > a very substantial burden on people who propose > > removing the matrix object to demonstrate > > just how the NumPy community will benefit from > > this change. > > This is a perfectly valid argument. I am actually quite happy with the > numpy package as it is (I work in astronomy), I was just pointing out > that if there are few of the core numpy people interested in maintaing > or upgrading the matrix class one solution might be to make it a > scipy-like package that easily can be installed on top of numpy, but > where the code base might be more accessible to those that are > interested in matrices, but feel that numpy is a daunting beast to > tackle. > Some sense of ownership of a matrixpy package might encourage more > people to contribute. > > Just an idea ;-) > Tommy > > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] performance matrix multiplication vs. matlab
Hi, Thanks for the suggestion. Unfortunately I'm using university managed machines here, so I have no control over the distribution, not even root access. However, I just downloaded the latest Enthought distribution, which uses numpy 1.3, and now numpy is only 30% to 60% slower than matlab, instead of 5 times slower. I can live with that. (whether it uses atlas now or not, I don't know). Cheers David Quoting Jason Rennie : > Hi David, > > Let me suggest that you try the latest version of Ubuntu (9.04/Jaunty), > which was released two months ago. It sounds like you are effectively using > release 5 of RedHat Linux which was originally released May 2007. There > have been updates (5.1, 5.2, 5.3), but, if my memory serves me correctly, > RedHat updates are more focused on fixing bugs and security issues rather > than improving functionality. Ubuntu does a full, new release every 6 > months so you don't have to wait as long to see improvements. Ubuntu also > has a tremendously better package management system. You generally > shouldn't be installing packages by hand as it sounds like you are doing. > > This post suggests that the latest version of Ubuntu is up-to-date wrt > ATLAS: > > http://www.mail-archive.com/numpy-discussion@scipy.org/msg13102.html > > Jason > > On Fri, Jun 5, 2009 at 5:44 AM, David Paul Reichert < > d.p.reich...@sms.ed.ac.uk> wrote: > >> Thanks for the replies so far. >> >> I had already tested using an already transposed matrix in the loop, >> it didn't make any difference. Oh and btw, I'm on (Scientific) Linux. >> >> I used the Enthought distribution, but I guess I'll have to get >> my hands dirty and try to get that Atlas thing working (I'm not >> a Linux expert though). My simulations pretty much consist of >> matrix multiplications, so if I don't get rid of that factor 5, >> I pretty much have to get back to Matlab. >> >> When you said Atlas is going to be optimized for my system, does >> that mean I should compile everything on each machine separately? >> I.e. I have a not-so-great desktop machine and one of those bigger >> multicore things available... >> >> Cheers >> >> David >> > > -- > Jason Rennie > Research Scientist, ITA Software > 617-714-2645 > http://www.itasoftware.com/ > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Import fails of scipy.factorial when installed as a zipped egg
On Fri, Jun 5, 2009 at 9:25 AM, Fadhley Salim wrote: > I've noticed that with scipy 0.7.0 + numpy 1.2.1, an importing the factorial > function from the scipy module always seems to fail when scipy is installed > as a zipped ",egg" file. When the project is installed as an unzipped > directory it works fine. > > Is there any reason why this function should not be egg-safe? > > My test to verify this was pretty simple: I just installed my scipy egg > (made by extracting the Windows, Python 2.4 Superpack) with the easy_install > command. Whenever I install it with the "-Z" option (to uncompress) it works > fine. With the "-z" option it always fails. > I don't think numpy/scipy are zip safe, the numpy packageloader uses os.path to find files. I would expect that you are not able to import anything, factorial might be just the first function that is loaded. easy_install usually does a check for zipsafe, and if it unpacks the egg it usually means it's not zipsafe, for example because of the use of __file__. That's my guess, Josef ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Import fails of scipy.factorial when installed as a zipped egg
I've noticed that with scipy 0.7.0 + numpy 1.2.1, an importing the factorial function from the scipy module always seems to fail when scipy is installed as a zipped ",egg" file. When the project is installed as an unzipped directory it works fine. Is there any reason why this function should not be egg-safe? My test to verify this was pretty simple: I just installed my scipy egg (made by extracting the Windows, Python 2.4 Superpack) with the easy_install command. Whenever I install it with the "-Z" option (to uncompress) it works fine. With the "-z" option it always fails. Thanks! Sal Disclaimer CALYON UK: This email does not create a legal relationship between any member of the Cr=E9dit Agricole group and the recipient or constitute investment advice. The content of this email should not be copied or disclosed (in whole or part) to any other person. It may contain information which is confidential, privileged or otherwise protected from disclosure. If you are not the intended recipient, you should notify us and delete it from your system. Emails may be monitored, are not secure and may be amended, destroyed or contain viruses and in communicating with us such conditions are accepted. Any content which does not relate to business matters is not endorsed by us. Calyon is authorised by the Comit=e9 des Etablissements de Cr=e9dit et des Entreprises d'Investissement (CECEI) and supervised by the Commission Bancaire in France and subject to limited regulation by the Financial Services Authority. Details about the extent of our regulation by the Financial Services Authority are available from us on request. Calyon is incorporated in France with limited liability and registered in England & Wales. Registration number: FC008194. Registered office: Broadwalk House, 5 Appold Street, London, EC2A 2DA. Disclaimer CALYON France: This message and/or any attachments is intended for the sole use of its addressee. If you are not the addressee, please immediately notify the sender and then destroy the message. As this message and/or any attachments may have been altered without our knowledge, its content is not legally binding on CALYON Crédit Agricole CIB. All rights reserved. Ce message et ses pièces jointes est destiné à l'usage exclusif de son destinataire. Si vous recevez ce message par erreur, merci d'en aviser immédiatement l'expéditeur et de le détruire ensuite. Le présent message pouvant être altéré à notre insu, CALYON Crédit Agricole CIB ne peut pas être engagé par son contenu. Tous droits réservés. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] performance matrix multiplication vs. matlab
Hi David, Let me suggest that you try the latest version of Ubuntu (9.04/Jaunty), which was released two months ago. It sounds like you are effectively using release 5 of RedHat Linux which was originally released May 2007. There have been updates (5.1, 5.2, 5.3), but, if my memory serves me correctly, RedHat updates are more focused on fixing bugs and security issues rather than improving functionality. Ubuntu does a full, new release every 6 months so you don't have to wait as long to see improvements. Ubuntu also has a tremendously better package management system. You generally shouldn't be installing packages by hand as it sounds like you are doing. This post suggests that the latest version of Ubuntu is up-to-date wrt ATLAS: http://www.mail-archive.com/numpy-discussion@scipy.org/msg13102.html Jason On Fri, Jun 5, 2009 at 5:44 AM, David Paul Reichert < d.p.reich...@sms.ed.ac.uk> wrote: > Thanks for the replies so far. > > I had already tested using an already transposed matrix in the loop, > it didn't make any difference. Oh and btw, I'm on (Scientific) Linux. > > I used the Enthought distribution, but I guess I'll have to get > my hands dirty and try to get that Atlas thing working (I'm not > a Linux expert though). My simulations pretty much consist of > matrix multiplications, so if I don't get rid of that factor 5, > I pretty much have to get back to Matlab. > > When you said Atlas is going to be optimized for my system, does > that mean I should compile everything on each machine separately? > I.e. I have a not-so-great desktop machine and one of those bigger > multicore things available... > > Cheers > > David > -- Jason Rennie Research Scientist, ITA Software 617-714-2645 http://www.itasoftware.com/ ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] scipy 0.7.1rc2 released
Hi, The RC2 for 0.7.1 scipy release has just been tagged. This is a bug-fixes only release, see below for the release notes. More information can also be found on the trac website: http://projects.scipy.org/scipy/milestone/0.7.1 The only code change compared to the RC1 is one fix which is essential for mac os x/python 2.6 combination. Tarballs, windows and mac os x binaries are available. Please test it ! I am particularly interested in results for scipy binaries on mac os x (do they work on ppc). The scipy developers -- = SciPy 0.7.1 Release Notes = .. contents:: SciPy 0.7.1 is a bug-fix release with no new features compared to 0.7.0. scipy.io Bugs fixed: - Several fixes in Matlab file IO scipy.odr = Bugs fixed: - Work around a failure with Python 2.6 scipy.signal Memory leak in lfilter have been fixed, as well as support for array object Bugs fixed: - #880, #925: lfilter fixes - #871: bicgstab fails on Win32 scipy.sparse Bugs fixed: - #883: scipy.io.mmread with scipy.sparse.lil_matrix broken - lil_matrix and csc_matrix reject now unexpected sequences, cf. http://thread.gmane.org/gmane.comp.python.scientific.user/19996 scipy.special = Several bugs of varying severity were fixed in the special functions: - #503, #640: iv: problems at large arguments fixed by new implementation - #623: jv: fix errors at large arguments - #679: struve: fix wrong output for v < 0 - #803: pbdv produces invalid output - #804: lqmn: fix crashes on some input - #823: betainc: fix documentation - #834: exp1 strange behavior near negative integer values - #852: jn_zeros: more accurate results for large s, also in jnp/yn/ynp_zeros - #853: jv, yv, iv: invalid results for non-integer v < 0, complex x - #854: jv, yv, iv, kv: return nan more consistently when out-of-domain - #927: ellipj: fix segfault on Windows - #946: ellpj: fix segfault on Mac OS X/python 2.6 combination. - ive, jve, yve, kv, kve: with real-valued input, return nan for out-of-domain instead of returning only the real part of the result. Also, when ``scipy.special.errprint(1)`` has been enabled, warning messages are now issued as Python warnings instead of printing them to stderr. scipy.stats === - linregress, mannwhitneyu, describe: errors fixed - kstwobign, norm, expon, exponweib, exponpow, frechet, genexpon, rdist, truncexpon, planck: improvements to numerical accuracy in distributions Windows binaries for python 2.6 === python 2.6 binaries for windows are now included. The binary for python 2.5 requires numpy 1.2.0 or above, and and the one for python 2.6 requires numpy 1.3.0 or above. Universal build for scipy = Mac OS X binary installer is now a proper universal build, and does not depend on gfortran anymore (libgfortran is statically linked). The python 2.5 version of scipy requires numpy 1.2.0 or above, the python 2.6 version requires numpy 1.3.0 or above. Checksums = 08cdf8d344535fcb5407dafd9f120b9b release/installers/scipy-0.7.1rc2.tar.gz 93595ca9f0b5690a6592c9fc43e9253d release/installers/scipy-0.7.1rc2-py2.6-macosx10.5.dmg fc8f434a9b4d76f1b38b7025f425127b release/installers/scipy-0.7.1rc2.zip 8cdc2472f3282f08a703cdcca5c92952 release/installers/scipy-0.7.1rc2-win32-superpack-python2.5.exe 15c4c45de931bd7f13e4ce24bd59579e release/installers/scipy-0.7.1rc2-win32-superpack-python2.6.exe e42853e39b3b4f590824e3a262863ef6 release/installers/scipy-0.7.1rc2-py2.5-macosx10.5.dmg ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] performance matrix multiplication vs. matlab
Sebastian Walter wrote: > On Fri, Jun 5, 2009 at 11:58 AM, David > Cournapeau wrote: > >> Sebastian Walter wrote: >> >>> On Thu, Jun 4, 2009 at 10:56 PM, Chris Colbert wrote: >>> >>> I should update after reading the thread Sebastian linked: The current 1.3 version of numpy (don't know about previous versions) uses the optimized Atlas BLAS routines for numpy.dot() if numpy was compiled with these libraries. I've verified this on linux only, thought it shouldnt be any different on windows AFAIK. >>> in the best of all possible worlds this would be done by a package >>> maintainer >>> >>> >> Numpy packages on windows do use ATLAS, so I am not sure what you are >> referring to ? >> > I'm on debian unstable and my numpy (version 1.2.1) uses an unoptimized blas. > Yes, it is because the package on Linux are not well done in that respect (to their defense, numpy build is far from being packaging friendly, and is both fragile and obscure). > I had the impression that most ppl that use numpy are on linux. Sourceforge numbers tell a different story at least. I think most users on the ML use linux, and certainly almost every developer use linux or mac os x. But ML already filter most windows users - only geeks read ML :) I am pretty sure a vast majority of numpy users never even bother to look for the ML. >> On a side note, correctly packaging ATLAS is almost >> inherently impossible, since the build method of ATLAS can never produce >> the same binary (even on the same machine), and the binary is optimized >> for the machine it was built on. So if you want the best speed, you >> should build atlas by yourself - which is painful on windows (you need >> cygwin). >> > in the debian repositories there are different builds of atlas so > there could be different builds for numpy, too. > But there aren't > There are several problems: - packagers (rightfully) hate to have many versions of the same software - as for now, if ATLAS is detected, numpy is built differently than if it is linked against conventional blas/lapack - numpy on debian is not built with atlas support But there is certainly no need to build one numpy version for every atlas: the linux loader can load the most appropriate library depending on your architecture, the so called hwcap flag. If your CPU supports SSE2, and you have ATLAS installed for SSE2, then the loader will automatically load the libraries there instead of the one in /usr/lib by default. But because ATLAS is such a pain to support in a binary form, only ancient versions of ATLAS are packaged anyways (3.6.*). So if you care so much, you should build your own. >> On windows, if you really care about speed, you should try linking >> against the Intel MKL. That's what Matlab uses internally on recent >> versions, so you would get the same speed. But that's rather involved. >> > > It really depends on the CPU, compiler, how atlas was compiled, etc... it can be slightly faster to 10 times faster (if you use a very poorly optimized ATLAS). For some recent benchmarks: http://eigen.tuxfamily.org/index.php?title=Benchmark cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] performance matrix multiplication vs. matlab
On Fri, Jun 5, 2009 at 11:58 AM, David Cournapeau wrote: > Sebastian Walter wrote: >> On Thu, Jun 4, 2009 at 10:56 PM, Chris Colbert wrote: >> >>> I should update after reading the thread Sebastian linked: >>> >>> The current 1.3 version of numpy (don't know about previous versions) uses >>> the optimized Atlas BLAS routines for numpy.dot() if numpy was compiled with >>> these libraries. I've verified this on linux only, thought it shouldnt be >>> any different on windows AFAIK. >>> >> >> in the best of all possible worlds this would be done by a package >> maintainer >> > > Numpy packages on windows do use ATLAS, so I am not sure what you are > referring to ? I'm on debian unstable and my numpy (version 1.2.1) uses an unoptimized blas. I had the impression that most ppl that use numpy are on linux. But apparently this is a misconception. >On a side note, correctly packaging ATLAS is almost > inherently impossible, since the build method of ATLAS can never produce > the same binary (even on the same machine), and the binary is optimized > for the machine it was built on. So if you want the best speed, you > should build atlas by yourself - which is painful on windows (you need > cygwin). in the debian repositories there are different builds of atlas so there could be different builds for numpy, too. But there aren't > > On windows, if you really care about speed, you should try linking > against the Intel MKL. That's what Matlab uses internally on recent > versions, so you would get the same speed. But that's rather involved. How much faster is MKL than ATLAS? > > cheers, > > David > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] performance matrix multiplication vs. matlab
Sebastian Walter wrote: > On Thu, Jun 4, 2009 at 10:56 PM, Chris Colbert wrote: > >> I should update after reading the thread Sebastian linked: >> >> The current 1.3 version of numpy (don't know about previous versions) uses >> the optimized Atlas BLAS routines for numpy.dot() if numpy was compiled with >> these libraries. I've verified this on linux only, thought it shouldnt be >> any different on windows AFAIK. >> > > in the best of all possible worlds this would be done by a package > maintainer > Numpy packages on windows do use ATLAS, so I am not sure what you are referring to ? On a side note, correctly packaging ATLAS is almost inherently impossible, since the build method of ATLAS can never produce the same binary (even on the same machine), and the binary is optimized for the machine it was built on. So if you want the best speed, you should build atlas by yourself - which is painful on windows (you need cygwin). On windows, if you really care about speed, you should try linking against the Intel MKL. That's what Matlab uses internally on recent versions, so you would get the same speed. But that's rather involved. cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] performance matrix multiplication vs. matlab
On Thu, Jun 4, 2009 at 10:56 PM, Chris Colbert wrote: > I should update after reading the thread Sebastian linked: > > The current 1.3 version of numpy (don't know about previous versions) uses > the optimized Atlas BLAS routines for numpy.dot() if numpy was compiled with > these libraries. I've verified this on linux only, thought it shouldnt be > any different on windows AFAIK. in the best of all possible worlds this would be done by a package maintainer > > chris > > On Thu, Jun 4, 2009 at 4:54 PM, Chris Colbert wrote: >> >> Sebastian is right. >> >> Since Matlab r2007 (i think that's the version) it has included support >> for multi-core architecture. On my core2 Quad here at the office, r2008b has >> no problem utilizing 100% cpu for large matrix multiplications. >> >> >> If you download and build atlas and lapack from source and enable >> parrallel threads in atlas, then compile numpy against these libraries, you >> should achieve similar if not better performance (since the atlas routines >> will be tuned to your system). >> >> If you're on Windows, you need to do some trickery to get threading to >> work (the instructions are on the atlas website). >> >> Chris >> >> > > > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] performance matrix multiplication vs. matlab
Thanks for the replies so far. I had already tested using an already transposed matrix in the loop, it didn't make any difference. Oh and btw, I'm on (Scientific) Linux. I used the Enthought distribution, but I guess I'll have to get my hands dirty and try to get that Atlas thing working (I'm not a Linux expert though). My simulations pretty much consist of matrix multiplications, so if I don't get rid of that factor 5, I pretty much have to get back to Matlab. When you said Atlas is going to be optimized for my system, does that mean I should compile everything on each machine separately? I.e. I have a not-so-great desktop machine and one of those bigger multicore things available... Cheers David Quoting David Cournapeau : > David Warde-Farley wrote: >> On 4-Jun-09, at 5:03 PM, Anne Archibald wrote: >> >> >>> Apart from the implementation issues people have chimed in about >>> already, it's worth noting that the speed of matrix multiplication >>> depends on the memory layout of the matrices. So generating B instead >>> directly as a 100 by 500 matrix might affect the speed substantially >>> (I'm not sure in which direction). If MATLAB's matrices have a >>> different memory order, that might be a factor as well. >>> >> >> AFAIK Matlab matrices are always Fortran ordered. >> >> Does anyone know if the defaults on Mac OS X (vecLib/Accelerate) >> support multicore? Is there any sense in compiling ATLAS on OS X (I >> know it can be done)? >> > > It may be worthwhile if you use a recent gcc and recent ATLAS. > Multithread support is supposed to be much better in 3.9.* compared to > 3.6.* (which is likely the version used on vecLib/Accelerate). The main > issue I could foresee is clashes between vecLib/Accelerate and Atlas if > you mix softwares which use one or the other together. > > For the OP question: recent matlab versions use the MKL, which is likely > to give higher performances than ATLAS, specially on windows (compilers > on that platform are ancient, as building atlas with native compilers on > windows requires super-human patience). > > David > ___ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] extract elements of an array that are contained in another array?
On 4-Jun-09, at 4:38 PM, Anne Archibald wrote: > It seems to me that this is the basic source of the problem. Perhaps > this can be addressed? I realize maintaining compatibility with the > current behaviour is necessary, so how about a multistage deprecation: > > 1. add a keyword argument to intersect1d "assume_unique"; if it is not > present, check for uniqueness and emit a warning if not unique > 2. change the warning to an exception > Optionally: > 3. change the meaning of the function to that of intersect1d_nu if the > keyword argument is not present > > One could do something similar with setmember1d. +1 on this idea. I've been bitten by the non-unique stuff in the past, especially with setmember1d, not realizing that both need to be unique. David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion