Re: [Numpy-discussion] numpy where function on different sized arrays
On 25 Nov 2012, at 00:29, numpy-discussion-requ...@scipy.org wrote: Message: 3 Date: Sat, 24 Nov 2012 23:23:36 +0100 From: Da?id davidmen...@gmail.com Subject: Re: [Numpy-discussion] numpy where function on different sized arrays To: Discussion of Numerical Python numpy-discussion@scipy.org Message-ID: CAJhcF=2nmveequo5q45oucqk+7knq_u0tvzuyosxgpqblsu...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 A pure Python approach could be: for i, x in enumerate(a): for j, y in enumerate(x): if y in b: idx.append((i,j)) Of course, it is slow if the arrays are large, but it is very readable, and probably very fast if cythonised. Thanks for all the answers. In that particular case speed is not important (A is 360x720 and b and c is lower than 10 in terms of dimension). However, I stumbled across similar comparison problems in IDL a couple of times where speed was crucial. My own solution or attempt was this: == def fu(A, b, c): for x, y in zip(b,c): indx = np.where(A == x) A[indx] = y return A == The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy where function on different sized arrays
M = A[..., np.newaxis] == B will give you a 40x60x20 boolean 3d-array where M[..., i] gives you a boolean mask for all the occurrences of B[i] in A. If you wanted all the (i, j) pairs for each value in B, you could do something like import numpy as np from itertools import izip, groupby from operator import itemgetter id1, id2, id3 = np.where(A[..., np.newaxis] == B) order = np.argsort(id3) triples_iter = izip(id3[order], id1[order], id2[order]) grouped = groupby(triples_iter, itemgetter(0)) d = dict((b_value, [idx[1:] for idx in indices]) for b_value, indices in grouped) Then d[value] is a list of all the (i, j) pairs where A[i, j] == value, and the keys of d are every value in B. On Sat, Nov 24, 2012 at 3:36 PM, Siegfried Gonzi sgo...@staffmail.ed.ac.ukwrote: Hi all This must have been answered in the past but my google search capabilities are not the best. Given an array A say of dimension 40x60 and given another array/vector B of dimension 20 (the values in B occur only once). What I would like to do is the following which of course does not work (by the way doesn't work in IDL either): indx=where(A == B) I understand A and B are both of different dimensions. So my question: what would the fastest or proper way to accomplish this (I found a solution but think is rather awkward and not very scipy/numpy-tonic tough). Thanks -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy where function on different sized arrays
A pure Python approach could be: for i, x in enumerate(a): for j, y in enumerate(x): if y in b: idx.append((i,j)) Of course, it is slow if the arrays are large, but it is very readable, and probably very fast if cythonised. David. On Sat, Nov 24, 2012 at 10:19 PM, David Warde-Farley d.warde.far...@gmail.com wrote: M = A[..., np.newaxis] == B will give you a 40x60x20 boolean 3d-array where M[..., i] gives you a boolean mask for all the occurrences of B[i] in A. If you wanted all the (i, j) pairs for each value in B, you could do something like import numpy as np from itertools import izip, groupby from operator import itemgetter id1, id2, id3 = np.where(A[..., np.newaxis] == B) order = np.argsort(id3) triples_iter = izip(id3[order], id1[order], id2[order]) grouped = groupby(triples_iter, itemgetter(0)) d = dict((b_value, [idx[1:] for idx in indices]) for b_value, indices in grouped) Then d[value] is a list of all the (i, j) pairs where A[i, j] == value, and the keys of d are every value in B. On Sat, Nov 24, 2012 at 3:36 PM, Siegfried Gonzi sgo...@staffmail.ed.ac.uk wrote: Hi all This must have been answered in the past but my google search capabilities are not the best. Given an array A say of dimension 40x60 and given another array/vector B of dimension 20 (the values in B occur only once). What I would like to do is the following which of course does not work (by the way doesn't work in IDL either): indx=where(A == B) I understand A and B are both of different dimensions. So my question: what would the fastest or proper way to accomplish this (I found a solution but think is rather awkward and not very scipy/numpy-tonic tough). Thanks -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy where function on different sized arrays
I think that would lose information as to which value in B was at each position. I think you want: On Sat, Nov 24, 2012 at 5:23 PM, Daπid davidmen...@gmail.com wrote: A pure Python approach could be: for i, x in enumerate(a): for j, y in enumerate(x): if y in b: idx.append((i,j)) Of course, it is slow if the arrays are large, but it is very readable, and probably very fast if cythonised. David. On Sat, Nov 24, 2012 at 10:19 PM, David Warde-Farley d.warde.far...@gmail.com wrote: M = A[..., np.newaxis] == B will give you a 40x60x20 boolean 3d-array where M[..., i] gives you a boolean mask for all the occurrences of B[i] in A. If you wanted all the (i, j) pairs for each value in B, you could do something like import numpy as np from itertools import izip, groupby from operator import itemgetter id1, id2, id3 = np.where(A[..., np.newaxis] == B) order = np.argsort(id3) triples_iter = izip(id3[order], id1[order], id2[order]) grouped = groupby(triples_iter, itemgetter(0)) d = dict((b_value, [idx[1:] for idx in indices]) for b_value, indices in grouped) Then d[value] is a list of all the (i, j) pairs where A[i, j] == value, and the keys of d are every value in B. On Sat, Nov 24, 2012 at 3:36 PM, Siegfried Gonzi sgo...@staffmail.ed.ac.uk wrote: Hi all This must have been answered in the past but my google search capabilities are not the best. Given an array A say of dimension 40x60 and given another array/vector B of dimension 20 (the values in B occur only once). What I would like to do is the following which of course does not work (by the way doesn't work in IDL either): indx=where(A == B) I understand A and B are both of different dimensions. So my question: what would the fastest or proper way to accomplish this (I found a solution but think is rather awkward and not very scipy/numpy-tonic tough). Thanks -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy where function on different sized arrays
On Sat, Nov 24, 2012 at 7:08 PM, David Warde-Farley d.warde.far...@gmail.com wrote: I think that would lose information as to which value in B was at each position. I think you want: (premature send, stupid Gmail...) idx = {} for i, x in enumerate(a): for j, y in enumerate(x): if y in B: idx.setdefault(y, []).append((i,j)) On the problem size the OP specified, this is about 4x slower than the NumPy version I posted above. However with a small modification: idx = {} set_b = set(B) # makes 'if y in B' lookups much faster for i, x in enumerate(a): for j, y in enumerate(x): if y in set_b: idx.setdefault(y, []).append((i,j)) It actually beats my solution. With inputs: np.random.seed(0); A = np.random.random_integers(40, 59, size=(40, 60)); B = np.arange(40, 60) In [115]: timeit foo_py_orig(A, B) 100 loops, best of 3: 16.5 ms per loop In [116]: timeit foo_py(A, B) 100 loops, best of 3: 2.5 ms per loop In [117]: timeit foo_numpy(A, B) 100 loops, best of 3: 4.15 ms per loop Depending on the specifics of the inputs, a collections.DefaultDict could also help things. On Sat, Nov 24, 2012 at 5:23 PM, Daπid davidmen...@gmail.com wrote: A pure Python approach could be: for i, x in enumerate(a): for j, y in enumerate(x): if y in b: idx.append((i,j)) Of course, it is slow if the arrays are large, but it is very readable, and probably very fast if cythonised. David. On Sat, Nov 24, 2012 at 10:19 PM, David Warde-Farley d.warde.far...@gmail.com wrote: M = A[..., np.newaxis] == B will give you a 40x60x20 boolean 3d-array where M[..., i] gives you a boolean mask for all the occurrences of B[i] in A. If you wanted all the (i, j) pairs for each value in B, you could do something like import numpy as np from itertools import izip, groupby from operator import itemgetter id1, id2, id3 = np.where(A[..., np.newaxis] == B) order = np.argsort(id3) triples_iter = izip(id3[order], id1[order], id2[order]) grouped = groupby(triples_iter, itemgetter(0)) d = dict((b_value, [idx[1:] for idx in indices]) for b_value, indices in grouped) Then d[value] is a list of all the (i, j) pairs where A[i, j] == value, and the keys of d are every value in B. On Sat, Nov 24, 2012 at 3:36 PM, Siegfried Gonzi sgo...@staffmail.ed.ac.uk wrote: Hi all This must have been answered in the past but my google search capabilities are not the best. Given an array A say of dimension 40x60 and given another array/vector B of dimension 20 (the values in B occur only once). What I would like to do is the following which of course does not work (by the way doesn't work in IDL either): indx=where(A == B) I understand A and B are both of different dimensions. So my question: what would the fastest or proper way to accomplish this (I found a solution but think is rather awkward and not very scipy/numpy-tonic tough). Thanks -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion