On Thu, Jun 4, 2009 at 10:13 AM, Alan G Isaac <ais...@american.edu> wrote: >> On Thu, Jun 4, 2009 at 8:23 AM, Alan G Isaac <ais...@american.edu> wrote: >>> a[(a==b[:,None]).sum(axis=0,dtype=bool)] > > > On 6/4/2009 8:35 AM josef.p...@gmail.com apparently wrote: >> If b is large this creates a huge intermediate array > > > True enough, but one could then use fromiter: > setb = set(b) > itr = (ai for ai in a if ai in setb) > out = np.fromiter(itr, dtype=a.dtype) > > I suspect (?) that b would have to be pretty > big relative to a for the repeated testing > to be more costly than sorting a.
I didn't look at this case very closely for speed, setmember1d and setmember1d_nu return a boolean array, that can be used for indexing, not the actual elements. Your iterator is in python and could be pretty slow, but I only ran the performance script attached to the ticket and the speed differences for different ways of doing it were pretty big for large arrays. > > Or if a stable order is not important (I don't > recall if the OP specified), one could just > np.intersect1d(a, np.unique(b)) This requires that also `a` has only unique elements. intersect1d_nu doesn't require unique elements. > > On a different note, I think a name change > is needed for your function. (Compare > intersect1d_nu to see the potential > confusion. And btw, what is the use case > for intersect1d, which gives neither a > set intersection nor a multiset intersection?) intersect1d gives set intersection if both arrays have only unique elements (i.e. are sets). I thought the naming is pretty clear: intersect1d(a,b) set intersection if a and b with unique elements intersect1d_nu(a,b) set intersection if a and b with non-unique elements setmember1d(a,b) boolean index array for a of set intersection if a and b with unique elements setmember1d_nu(a,b) boolean index array for a of set intersection if a and b with non-unique elements The new docs http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/ are a bit clearer. However, I haven't used either of these functions much, and non of them are *my* functions. Of the arraysetops functions, I use unique1d most (because of the return index). I just keep track of these functions because of the use for categorical and dummy variables. Josef > > Cheers, > Alan Isaac > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion