Travis Oliphant wrote: > Albert Strasheim wrote: > > >> In [571]: x1 = N.random.randn(2000,39) >> >> In [572]: y1 = N.random.randn(64,39) >> >> In [574]: %timeit z1=x1[...,N.newaxis,...]-y1 10 loops, best of 3: 703 ms >> per loop >> >> In [575]: z1.shape >> Out[575]: (2000, 64, 39) >> >> As far as I can figure, this operation is doing 2000*64*39 subtractions. >> Doing this straight up yields the following: >> >> In [576]: x2 = N.random.randn(2000,64,39) >> >> In [577]: y2 = N.random.randn(2000,64,39) >> >> In [578]: %timeit z2 = x2-y2 >> 10 loops, best of 3: 108 ms per loop >> >> Does anybody have any ideas on why this is so much faster? Hopefully I >> didn't mess up somewhere... >> >> >> > > I suspect I know why, although the difference seems rather large. There > is code optimization that is being taken advantage of in the second > case. If you have contiguous arrays (no broadcasting needed), then 1 > C-loop is used for the subtraction (your second case). > > In the first case you are using broadcasting to generate the larger > array. This requires more complicated looping constructs under the > covers which causes your overhead. Bascially, you will have 64*39 1-d > loops of 2000 elements each in the first example with a bit of > calculation over-head to reset the pointers before each loop. > > In the ufunc code, compare the ONE_UFUNCLOOP case with the > NOBUFFER_UFUNCLOOP case. If you want to be sure what is running > un-comment the fprintf statements so you can tell. > > I'm surprised the overhead of adjusting pointers is so high, but then > again you are probably getting a lot of cache misses in the first case > so there is more to it than that, the loops may run more slowly too. > > I suspect that Travis is basically right about why your simple subtraction runs much faster than your test case.
However, that doesn't mean you can't do better than at present. When dealing with large, multidimensional arrays my experience has been that unrolling all of the for loops is frequently counterproductive. I chalk this up to two factors: first that you tend to end up generating large temporary arrays and this in turn leads to cache misses. Second, you loose flexibility in how you perform the calculation, which in turns limits other possible explanations. I just spent a while playing with this, and assuming I've correctly translated your original intent I've come up with two alternative, looping versions that run, respectively 2 and 3 times faster. I've a feeling that kmean3, the fastest one, still has a little more room to be sped up, but I'm out of time now. Code is below -tim import numpy as N data = N.random.randn(2000, 39) def kmean0(data): nclusters = 64 code = data[0:nclusters,:] return N.sum((data[...,N.newaxis,...]-code)**2, 2).argmin(axis=1) def kmean1(data): nclusters = 64 code = data[0:nclusters,:] z = data[:,N.newaxis,:] z = z-code z = z**2 z = N.sum(z, 2) return z.argmin(axis=1) def kmean2(data): nclusters = 64 naxes = data.shape[-1] code = data[0:nclusters,:] data = data[:, N.newaxis] allz = N.zeros([len(data)]) for i, x in enumerate(data): z = (x - code) z **= 2 allz[i] = z.sum(-1).argmin(0) return allz def kmean3(data): nclusters = 64 naxes = data.shape[-1] code = data[0:nclusters] totals = N.zeros([nclusters, len(data)], float) transdata = data.transpose().copy() for cluster, tot in zip(code, totals): for di, ci in zip(transdata, cluster): delta = di - ci delta **=2 tot += delta return totals.argmin(axis=0) if __name__ == '__main__': assert N.alltrue(kmean0(data) == kmean1(data)) assert N.alltrue(kmean0(data) == kmean2(data)) assert N.alltrue(kmean0(data) == kmean3(data)) from timeit import Timer print Timer('kmean0(data)', 'from scratch import kmean0, data').timeit(3) print Timer('kmean1(data)', 'from scratch import kmean1, data').timeit(3) print Timer('kmean2(data)', 'from scratch import kmean2, data').timeit(3) print Timer('kmean3(data)', 'from scratch import kmean3, data').timeit(3) ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion