Re: [Numpy-discussion] record arrays and vectorizing

2012-05-03 Thread Richard Hattersley
Sounds like it could be a good match for `scipy.spatial.cKDTree`.

It can handle single-element queries...

 element = numpy.arange(1, 8)
 targets = numpy.random.uniform(0, 8, (1000, 7))
 tree = scipy.spatial.cKDTree(targets)
 distance, index = tree.query(element)
array([ 1.68457267,  4.26370212,  3.14837617,  4.67616512,  5.80572286,
6.46823904,  6.12957534])

Or even multi-element queries (shown here searching for 3 elements in one

 elements = numpy.linspace(1, 8, 21).reshape((3, 7))
array([[ 1.  ,  1.35,  1.7 ,  2.05,  2.4 ,  2.75,  3.1 ],
   [ 3.45,  3.8 ,  4.15,  4.5 ,  4.85,  5.2 ,  5.55],
   [ 5.9 ,  6.25,  6.6 ,  6.95,  7.3 ,  7.65,  8.  ]])
 distances, indices = tree.query(element)
array([[ 0.24314961,  2.77933521,  2.00092505,  3.25180563,  2.05392726,
 2.80559459,  4.43030939],
   [ 4.19270199,  2.89257994,  3.91366449,  3.29262138,  3.6779851 ,
 4.06619636,  4.7183393 ],
   [ 6.58055518,  6.59232922,  7.00473346,  5.22612494,  7.07170015,
 6.54570121,  7.59566404]])

Richard Hattersley

On 2 May 2012 19:06, Moroney, Catherine M (388D) wrote:


 Can somebody give me some hints as to how to code up this function
 in pure python, rather than dropping down to Fortran?

 I will want to compare a 7-element vector (called element) to a large
 list of similarly-dimensioned
 vectors (called target, and pick out the vector in target that is the
 closest to element
 (determined by minimizing the Euclidean distance).

 For instance, in (slow) brute force form it would look like:

 element = numpy.array([1, 2, 3, 4, 5, 6, 7])
 target  = numpy.array(range(0, 49)).reshape(7,7)*0.1

 min_length = .0
 min_index  =
 for i in xrange(0, 7):
   distance = (element-target)**2
   distance = numpy.sqrt(distance.sum())
   if (distance  min_length):
  min_length = distance
  min_index  = i

 Now of course, the actual problem will be of a much larger scale.  I will
 an array of elements, and a large number of potential targets.

 I was thinking of having element be an array where each element itself is
 a numpy.ndarray, and then vectorizing the code above so as an output I
 have an array of the min_index and min_length values.

 I can get the following simple test to work so I may be on the right track:

 import numpy

 dtype = [(x, numpy.ndarray)]

 def single(data):
return data[0].min()

 multiple = numpy.vectorize(single)

 if __name__ == __main__:

a = numpy.arange(0, 16).reshape(4,4)
b = numpy.recarray((4), dtype=dtype)
for i in xrange(0, b.shape[0]):
b[i][x] = a[i,:]

print a
print b

x = multiple(b)
print x

 What is the best way of constructing b from a?  I tried b =
 numpy.recarray((4), dtype=dtype, buf=a)
 but I get a segmentation fault when I try to print b.

 Is there a way to perform this larger task efficiently with record arrays
 and vectorization, or
 am I off on the wrong track completely?  How can I do this efficiently
 without dropping
 down to Fortran?

 Thanks for any advice,

 NumPy-Discussion mailing list

NumPy-Discussion mailing list

[Numpy-discussion] record arrays and vectorizing

2012-05-02 Thread Moroney, Catherine M (388D)

Can somebody give me some hints as to how to code up this function
in pure python, rather than dropping down to Fortran?

I will want to compare a 7-element vector (called element) to a large list of 
vectors (called target, and pick out the vector in target that is the 
closest to element
(determined by minimizing the Euclidean distance).  

For instance, in (slow) brute force form it would look like:

element = numpy.array([1, 2, 3, 4, 5, 6, 7])
target  = numpy.array(range(0, 49)).reshape(7,7)*0.1

min_length = .0
min_index  = 
for i in xrange(0, 7):
   distance = (element-target)**2
   distance = numpy.sqrt(distance.sum())
   if (distance  min_length):
  min_length = distance
  min_index  = i

Now of course, the actual problem will be of a much larger scale.  I will have
an array of elements, and a large number of potential targets.  

I was thinking of having element be an array where each element itself is
a numpy.ndarray, and then vectorizing the code above so as an output I would
have an array of the min_index and min_length values.  

I can get the following simple test to work so I may be on the right track:

import numpy

dtype = [(x, numpy.ndarray)]

def single(data):
return data[0].min()

multiple = numpy.vectorize(single)

if __name__ == __main__:

a = numpy.arange(0, 16).reshape(4,4)
b = numpy.recarray((4), dtype=dtype)
for i in xrange(0, b.shape[0]):
b[i][x] = a[i,:]

print a
print b

x = multiple(b)
print x
What is the best way of constructing b from a?  I tried b = 
numpy.recarray((4), dtype=dtype, buf=a)
but I get a segmentation fault when I try to print b.

Is there a way to perform this larger task efficiently with record arrays and 
vectorization, or
am I off on the wrong track completely?  How can I do this efficiently without 
down to Fortran?

Thanks for any advice,

NumPy-Discussion mailing list

Re: [Numpy-discussion] record arrays and vectorizing

2012-05-02 Thread Stéfan van der Walt
On Wed, May 2, 2012 at 11:06 AM, Moroney, Catherine M (388D) wrote:
 I will want to compare a 7-element vector (called element) to a large list 
 of similarly-dimensioned
 vectors (called target, and pick out the vector in target that is the 
 closest to element
 (determined by minimizing the Euclidean distance).

It's not entirely clear what you mean from the description above.  In
the code example, you return a single index, but from the description
it sounds like you want to pick out a vector?

If you need multiple answers, one for each element, then you probably
need to do broadcasting as shown in the NumPy medkit:

NumPy-Discussion mailing list

Re: [Numpy-discussion] record arrays and vectorizing

2012-05-02 Thread Aronne Merrelli
On Wed, May 2, 2012 at 1:06 PM, Moroney, Catherine M (388D) wrote:

 Can somebody give me some hints as to how to code up this function
 in pure python, rather than dropping down to Fortran?

 I will want to compare a 7-element vector (called element) to a large list 
 of similarly-dimensioned
 vectors (called target, and pick out the vector in target that is the 
 closest to element
 (determined by minimizing the Euclidean distance).

 For instance, in (slow) brute force form it would look like:

 element = numpy.array([1, 2, 3, 4, 5, 6, 7])
 target  = numpy.array(range(0, 49)).reshape(7,7)*0.1

 min_length = .0
 min_index  =
 for i in xrange(0, 7):
   distance = (element-target)**2
   distance = numpy.sqrt(distance.sum())
   if (distance  min_length):
      min_length = distance
      min_index  = i

If you are just trying to find the index to the vector in target
that is closest to element, then I think the default broadcasting
would work fine. Here is an example that should work (the broadcasting
is done for the subtraction element-targets):

In [39]: element = np.arange(1,8)
In [40]: targets = np.random.uniform(0,8,(1000,7))
In [41]: distance_squared = ((element-targets)**2).sum(1)
In [42]: min_index = distance_squared.argmin()
In [43]: element
Out[43]: array([1, 2, 3, 4, 5, 6, 7])
In [44]: targets[min_index,:]
array([ 1.93625981,  2.56137284,  2.23395169,  4.15215253,  3.96478248,
5.21829915,  5.13049489])

Note - depending on the number of vectors in targets, it might be
better to have everything transposed if you are really worried about
the timing; you'd need to try that for your particular case.

Hope that helps,
NumPy-Discussion mailing list