Sounds like it could be a good match for `scipy.spatial.cKDTree`. It can handle single-element queries...
>>> element = numpy.arange(1, 8) >>> targets = numpy.random.uniform(0, 8, (1000, 7)) >>> tree = scipy.spatial.cKDTree(targets) >>> distance, index = tree.query(element) >>> targets[index] array([ 1.68457267, 4.26370212, 3.14837617, 4.67616512, 5.80572286, 6.46823904, 6.12957534]) Or even multi-element queries (shown here searching for 3 elements in one call)... >>> elements = numpy.linspace(1, 8, 21).reshape((3, 7)) >>> elements array([[ 1. , 1.35, 1.7 , 2.05, 2.4 , 2.75, 3.1 ], [ 3.45, 3.8 , 4.15, 4.5 , 4.85, 5.2 , 5.55], [ 5.9 , 6.25, 6.6 , 6.95, 7.3 , 7.65, 8. ]]) >>> distances, indices = tree.query(element) >>> targets[indices] array([[ 0.24314961, 2.77933521, 2.00092505, 3.25180563, 2.05392726, 2.80559459, 4.43030939], [ 4.19270199, 2.89257994, 3.91366449, 3.29262138, 3.6779851 , 4.06619636, 4.7183393 ], [ 6.58055518, 6.59232922, 7.00473346, 5.22612494, 7.07170015, 6.54570121, 7.59566404]]) Richard Hattersley On 2 May 2012 19:06, Moroney, Catherine M (388D) < catherine.m.moro...@jpl.nasa.gov> wrote: > Hello, > > Can somebody give me some hints as to how to code up this function > in pure python, rather than dropping down to Fortran? > > I will want to compare a 7-element vector (called "element") to a large > list of similarly-dimensioned > vectors (called "target", and pick out the vector in "target" that is the > closest to "element" > (determined by minimizing the Euclidean distance). > > For instance, in (slow) brute force form it would look like: > > element = numpy.array([1, 2, 3, 4, 5, 6, 7]) > target = numpy.array(range(0, 49)).reshape(7,7)*0.1 > > min_length = 9999.0 > min_index = > for i in xrange(0, 7): > distance = (element-target)**2 > distance = numpy.sqrt(distance.sum()) > if (distance < min_length): > min_length = distance > min_index = i > > Now of course, the actual problem will be of a much larger scale. I will > have > an array of elements, and a large number of potential targets. > > I was thinking of having element be an array where each element itself is > a numpy.ndarray, and then vectorizing the code above so as an output I > would > have an array of the "min_index" and "min_length" values. > > I can get the following simple test to work so I may be on the right track: > > import numpy > > dtype = [("x", numpy.ndarray)] > > def single(data): > return data[0].min() > > multiple = numpy.vectorize(single) > > if __name__ == "__main__": > > a = numpy.arange(0, 16).reshape(4,4) > b = numpy.recarray((4), dtype=dtype) > for i in xrange(0, b.shape[0]): > b[i]["x"] = a[i,:] > > print a > print b > > x = multiple(b) > print x > > What is the best way of constructing "b" from "a"? I tried b = > numpy.recarray((4), dtype=dtype, buf=a) > but I get a segmentation fault when I try to print b. > > Is there a way to perform this larger task efficiently with record arrays > and vectorization, or > am I off on the wrong track completely? How can I do this efficiently > without dropping > down to Fortran? > > Thanks for any advice, > > Catherine > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion