Re: [Numpy-discussion] record arrays and vectorizing
Sounds like it could be a good match for `scipy.spatial.cKDTree`. It can handle single-element queries... element = numpy.arange(1, 8) targets = numpy.random.uniform(0, 8, (1000, 7)) tree = scipy.spatial.cKDTree(targets) distance, index = tree.query(element) targets[index] array([ 1.68457267, 4.26370212, 3.14837617, 4.67616512, 5.80572286, 6.46823904, 6.12957534]) Or even multi-element queries (shown here searching for 3 elements in one call)... elements = numpy.linspace(1, 8, 21).reshape((3, 7)) elements array([[ 1. , 1.35, 1.7 , 2.05, 2.4 , 2.75, 3.1 ], [ 3.45, 3.8 , 4.15, 4.5 , 4.85, 5.2 , 5.55], [ 5.9 , 6.25, 6.6 , 6.95, 7.3 , 7.65, 8. ]]) distances, indices = tree.query(element) targets[indices] array([[ 0.24314961, 2.77933521, 2.00092505, 3.25180563, 2.05392726, 2.80559459, 4.43030939], [ 4.19270199, 2.89257994, 3.91366449, 3.29262138, 3.6779851 , 4.06619636, 4.7183393 ], [ 6.58055518, 6.59232922, 7.00473346, 5.22612494, 7.07170015, 6.54570121, 7.59566404]]) Richard Hattersley On 2 May 2012 19:06, Moroney, Catherine M (388D) catherine.m.moro...@jpl.nasa.gov wrote: Hello, Can somebody give me some hints as to how to code up this function in pure python, rather than dropping down to Fortran? I will want to compare a 7-element vector (called element) to a large list of similarly-dimensioned vectors (called target, and pick out the vector in target that is the closest to element (determined by minimizing the Euclidean distance). For instance, in (slow) brute force form it would look like: element = numpy.array([1, 2, 3, 4, 5, 6, 7]) target = numpy.array(range(0, 49)).reshape(7,7)*0.1 min_length = .0 min_index = for i in xrange(0, 7): distance = (element-target)**2 distance = numpy.sqrt(distance.sum()) if (distance min_length): min_length = distance min_index = i Now of course, the actual problem will be of a much larger scale. I will have an array of elements, and a large number of potential targets. I was thinking of having element be an array where each element itself is a numpy.ndarray, and then vectorizing the code above so as an output I would have an array of the min_index and min_length values. I can get the following simple test to work so I may be on the right track: import numpy dtype = [(x, numpy.ndarray)] def single(data): return data[0].min() multiple = numpy.vectorize(single) if __name__ == __main__: a = numpy.arange(0, 16).reshape(4,4) b = numpy.recarray((4), dtype=dtype) for i in xrange(0, b.shape[0]): b[i][x] = a[i,:] print a print b x = multiple(b) print x What is the best way of constructing b from a? I tried b = numpy.recarray((4), dtype=dtype, buf=a) but I get a segmentation fault when I try to print b. Is there a way to perform this larger task efficiently with record arrays and vectorization, or am I off on the wrong track completely? How can I do this efficiently without dropping down to Fortran? Thanks for any advice, Catherine ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] record arrays and vectorizing
Hello, Can somebody give me some hints as to how to code up this function in pure python, rather than dropping down to Fortran? I will want to compare a 7-element vector (called element) to a large list of similarly-dimensioned vectors (called target, and pick out the vector in target that is the closest to element (determined by minimizing the Euclidean distance). For instance, in (slow) brute force form it would look like: element = numpy.array([1, 2, 3, 4, 5, 6, 7]) target = numpy.array(range(0, 49)).reshape(7,7)*0.1 min_length = .0 min_index = for i in xrange(0, 7): distance = (element-target)**2 distance = numpy.sqrt(distance.sum()) if (distance min_length): min_length = distance min_index = i Now of course, the actual problem will be of a much larger scale. I will have an array of elements, and a large number of potential targets. I was thinking of having element be an array where each element itself is a numpy.ndarray, and then vectorizing the code above so as an output I would have an array of the min_index and min_length values. I can get the following simple test to work so I may be on the right track: import numpy dtype = [(x, numpy.ndarray)] def single(data): return data[0].min() multiple = numpy.vectorize(single) if __name__ == __main__: a = numpy.arange(0, 16).reshape(4,4) b = numpy.recarray((4), dtype=dtype) for i in xrange(0, b.shape[0]): b[i][x] = a[i,:] print a print b x = multiple(b) print x What is the best way of constructing b from a? I tried b = numpy.recarray((4), dtype=dtype, buf=a) but I get a segmentation fault when I try to print b. Is there a way to perform this larger task efficiently with record arrays and vectorization, or am I off on the wrong track completely? How can I do this efficiently without dropping down to Fortran? Thanks for any advice, Catherine ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] record arrays and vectorizing
On Wed, May 2, 2012 at 11:06 AM, Moroney, Catherine M (388D) catherine.m.moro...@jpl.nasa.gov wrote: I will want to compare a 7-element vector (called element) to a large list of similarly-dimensioned vectors (called target, and pick out the vector in target that is the closest to element (determined by minimizing the Euclidean distance). It's not entirely clear what you mean from the description above. In the code example, you return a single index, but from the description it sounds like you want to pick out a vector? If you need multiple answers, one for each element, then you probably need to do broadcasting as shown in the NumPy medkit: http://mentat.za.net/numpy/numpy_advanced_slides/ Stéfan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] record arrays and vectorizing
On Wed, May 2, 2012 at 1:06 PM, Moroney, Catherine M (388D) catherine.m.moro...@jpl.nasa.gov wrote: Hello, Can somebody give me some hints as to how to code up this function in pure python, rather than dropping down to Fortran? I will want to compare a 7-element vector (called element) to a large list of similarly-dimensioned vectors (called target, and pick out the vector in target that is the closest to element (determined by minimizing the Euclidean distance). For instance, in (slow) brute force form it would look like: element = numpy.array([1, 2, 3, 4, 5, 6, 7]) target = numpy.array(range(0, 49)).reshape(7,7)*0.1 min_length = .0 min_index = for i in xrange(0, 7): distance = (element-target)**2 distance = numpy.sqrt(distance.sum()) if (distance min_length): min_length = distance min_index = i If you are just trying to find the index to the vector in target that is closest to element, then I think the default broadcasting would work fine. Here is an example that should work (the broadcasting is done for the subtraction element-targets): In [39]: element = np.arange(1,8) In [40]: targets = np.random.uniform(0,8,(1000,7)) In [41]: distance_squared = ((element-targets)**2).sum(1) In [42]: min_index = distance_squared.argmin() In [43]: element Out[43]: array([1, 2, 3, 4, 5, 6, 7]) In [44]: targets[min_index,:] Out[44]: array([ 1.93625981, 2.56137284, 2.23395169, 4.15215253, 3.96478248, 5.21829915, 5.13049489]) Note - depending on the number of vectors in targets, it might be better to have everything transposed if you are really worried about the timing; you'd need to try that for your particular case. Hope that helps, Aronne ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion