On Fri, Jun 5, 2009 at 4:27 PM, Keith Goodman <kwgood...@gmail.com> wrote: > On Fri, Jun 5, 2009 at 1:22 PM, Brent Pedersen <bpede...@gmail.com> wrote: >> On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman<kwgood...@gmail.com> wrote: >>> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman <kwgood...@gmail.com> wrote: >>>> On Fri, Jun 5, 2009 at 12:53 PM, <josef.p...@gmail.com> wrote: >>>>> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais <bbl...@bryant.edu> wrote: >>>>>> Hello, >>>>>> I have a vectorizing problem that I don't see an obvious way to solve. >>>>>> What >>>>>> I have is a vector like: >>>>>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>>>>> and a matrix >>>>>> T=zeros((6,6)) >>>>>> and what I want in T is a count of all of the transitions in obs, e.g. >>>>>> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the >>>>>> sequence 3-4 only happens once, etc... I can do it unvectorized like: >>>>>> for o1,o2 in zip(obs[:-1],obs[1:]): >>>>>> T[o1,o2]+=1 >>>>>> >>>>>> which gives the correct answer from above, which is: >>>>>> array([[ 0., 0., 0., 0., 0., 0.], >>>>>> [ 0., 0., 3., 0., 0., 1.], >>>>>> [ 0., 3., 0., 1., 0., 0.], >>>>>> [ 0., 0., 2., 0., 1., 0.], >>>>>> [ 0., 0., 0., 2., 0., 0.], >>>>>> [ 0., 0., 0., 0., 1., 0.]]) >>>>>> >>>>>> >>>>>> but I thought there would be a better way. I tried: >>>>>> o1=obs[:-1] >>>>>> o2=obs[1:] >>>>>> T[o1,o2]+=1 >>>>>> but this doesn't give a count, it just yields 1's at the transition >>>>>> points, >>>>>> like: >>>>>> array([[ 0., 0., 0., 0., 0., 0.], >>>>>> [ 0., 0., 1., 0., 0., 1.], >>>>>> [ 0., 1., 0., 1., 0., 0.], >>>>>> [ 0., 0., 1., 0., 1., 0.], >>>>>> [ 0., 0., 0., 1., 0., 0.], >>>>>> [ 0., 0., 0., 0., 1., 0.]]) >>>>>> >>>>>> Is there a clever way to do this? I could write a quick Cython solution, >>>>>> but I wanted to keep this as an all-numpy implementation if I can. >>>>>> >>>>> >>>>> histogram2d or its imitation, there was a discussion on histogram2d a >>>>> short time ago >>>>> >>>>>>>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2]) >>>>>>>> obs2 = obs - 1 >>>>>>>> trans = >>>>>>>> np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6) >>>>>>>> re = np.array([[ 0., 0., 0., 0., 0., 0.], >>>>> ... [ 0., 0., 3., 0., 0., 1.], >>>>> ... [ 0., 3., 0., 1., 0., 0.], >>>>> ... [ 0., 0., 2., 0., 1., 0.], >>>>> ... [ 0., 0., 0., 2., 0., 0.], >>>>> ... [ 0., 0., 0., 0., 1., 0.]]) >>>>>>>> np.all(re == trans) >>>>> True >>>>> >>>>>>>> trans >>>>> array([[0, 0, 0, 0, 0, 0], >>>>> [0, 0, 3, 0, 0, 1], >>>>> [0, 3, 0, 1, 0, 0], >>>>> [0, 0, 2, 0, 1, 0], >>>>> [0, 0, 0, 2, 0, 0], >>>>> [0, 0, 0, 0, 1, 0]]) >>>>> >>>>> >>>>> or >>>>> >>>>>>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>>>>>>> range=[[0,5],[0,5]]) >>>>>>>> re >>>>> array([[ 0., 0., 0., 0., 0., 0.], >>>>> [ 0., 0., 3., 0., 0., 1.], >>>>> [ 0., 3., 0., 1., 0., 0.], >>>>> [ 0., 0., 2., 0., 1., 0.], >>>>> [ 0., 0., 0., 2., 0., 0.], >>>>> [ 0., 0., 0., 0., 1., 0.]]) >>>>>>>> h >>>>> array([[ 0., 0., 0., 0., 0., 0.], >>>>> [ 0., 0., 3., 0., 0., 1.], >>>>> [ 0., 3., 0., 1., 0., 0.], >>>>> [ 0., 0., 2., 0., 1., 0.], >>>>> [ 0., 0., 0., 2., 0., 0.], >>>>> [ 0., 0., 0., 0., 1., 0.]]) >>>>> >>>>>>>> np.all(re == h) >>>>> True >>>> >>>> There's no way my list method can beat that. But by adding >>>> >>>> import psyco >>>> psyco.full() >>>> >>>> I get a total speed up of a factor of 15 when obs is length 10000. >>> >>> Actually, it is faster: >>> >>> histogram: >>> >>>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]]) >>>>> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, >>>>> range=[[0,5],[0,5]]) >>> 100 loops, best of 3: 4.14 ms per loop >>> >>> lists: >>> >>>>> timeit test(obs3, T3) >>> 1000 loops, best of 3: 1.32 ms per loop >>> _______________________________________________ >>> Numpy-discussion mailing list >>> Numpy-discussion@scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> here's a go: >> >> import numpy as np >> import random >> from itertools import groupby >> >> def test1(obs, T): >> for o1,o2 in zip(obs[:-1],obs[1:]): >> T[o1][o2] += 1 >> return T >> >> >> def test2(obs, T): >> s = zip(obs[:-1], obs[1:]) >> for idx, g in groupby(sorted(s)): >> T[idx] = len(list(g)) >> return T >> >> obs = [random.randint(0, 5) for z in range(10000)] >> >> print test2(obs, np.zeros((6, 6))) >> print test1(obs, np.zeros((6, 6))) >> >> >> ############## >> >> In [10]: timeit test1(obs, np.zeros((6, 6))) >> 100 loops, best of 3: 18.8 ms per loop >> >> In [11]: timeit test2(obs, np.zeros((6, 6))) >> 100 loops, best of 3: 6.91 ms per loop > > Nice! > > Try adding > > import psyco > psyco.full() > > to test1. Or is that cheating? > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
how about from scipy import ndimage ndimage.histogram((obs[:-1])*6 +obs[1:], 0, 35, 36 ).reshape(6,6) (bincount doesn't take the limits into account if they don't show up in the data, so first and last position would need special casing) Josef _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion