Hi, Bill Baxter proposed a version of this problem some months ago on this ML. I use it regularly and it is fast enough for me.
Matthieu 2008/5/21 Emanuele Olivetti <[EMAIL PROTECTED]>: > Dear all, > > I need to speed up this function (a little example follows): > ------ > import numpy as N > def distance_matrix(data1,data2,weights): > rows = data1.shape[0] > columns = data2.shape[0] > dm = N.zeros((rows,columns)) > for i in range(rows): > for j in range(columns): > dm[i,j] = ((data1[i,:]-data2[j,:])**2*weights).sum() > pass > pass > return dm > > size1 = 4 > size2 = 3 > dimensions = 2 > data1 = N.random.rand(size1,dimensions) > data2 = N.random.rand(size2,dimensions) > weights = N.random.rand(dimensions) > dm = distance_matrix(data1,data2,weights) > print dm > ------------------ > The distance_matrix function computes the weighted (squared) euclidean > distances between each pair of vectors from two sets (data1, data2). > The previous naive algorithm is extremely slow for my standard use, > i.e., when size1 and size2 are in the order of 1000 or more. It can be > improved using N.subtract.outer: > > def distance_matrix_faster(data1,data2,weights): > rows = data1.shape[0] > columns = data2.shape[0] > dm = N.zeros((rows,columns)) > for i in range(data1.shape[1]): > dm += N.subtract.outer(data1[:,i],data2[:,i])**2*weights[i] > pass > return dm > > This algorithm becomes slow when dimensions (i.e., data1.shape[1]) is > big (i.e., >1000), due to the Python loop. In order to speed it up, I guess > that N.subtract.outer could be used on the full matrices instead of one > column at a time. But then there is a memory issue: 'outer' allocates > too much memory since it stores all possible combinations along all > dimensions. This is clearly unnecessary. > > Is there a NumPy way to avoid all Python loops and without wasting > too much memory? As a comparison I coded the same algorithm in > C through weave (inline): it is _much_ faster and requires just > the memory to store the result. But I'd prefer not using C or weave > if possible. > > Thanks in advance for any help, > > > Emanuele > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion > -- French PhD student Website : http://matthieu-brucher.developpez.com/ Blogs : http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn : http://www.linkedin.com/in/matthieubrucher
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion