On Thu, Jun 4, 2009 at 10:13 AM, Alan G Isaac <ais...@american.edu> wrote:
>> On Thu, Jun 4, 2009 at 8:23 AM, Alan G Isaac <ais...@american.edu> wrote:
>>> a[(a==b[:,None]).sum(axis=0,dtype=bool)]
>
>
> On 6/4/2009 8:35 AM josef.p...@gmail.com apparently wrote:
>> If b is large this creates a huge intermediate array
>
>
> True enough, but one could then use fromiter:
> setb = set(b)
> itr = (ai for ai in a if ai in setb)
> out = np.fromiter(itr, dtype=a.dtype)
>
> I suspect (?) that b would have to be pretty
> big relative to a for the repeated testing
> to be more costly than sorting a.

I didn't look at this case very closely for speed, setmember1d and
setmember1d_nu return a boolean array, that can be used for indexing,
not the actual elements.

Your iterator is in python and could be pretty slow, but I only ran
the performance script attached to the ticket and the speed
differences for different ways of doing it were pretty big for large
arrays.

>
> Or if a stable order is not important (I don't
> recall if the OP specified), one could just
> np.intersect1d(a, np.unique(b))

This requires that also `a` has only unique elements.
intersect1d_nu doesn't require unique elements.

>
> On a different note, I think a name change
> is needed for your function. (Compare
> intersect1d_nu to see the potential
> confusion. And btw, what is the use case
> for intersect1d, which gives neither a
> set intersection nor a multiset intersection?)

intersect1d gives set intersection if both arrays have only unique
elements (i.e. are sets).
I thought the naming is pretty clear:

intersect1d(a,b)   set intersection if a and b with unique elements
intersect1d_nu(a,b)   set intersection if a and b with non-unique elements
setmember1d(a,b)  boolean index array for a of set intersection if a
and b with unique elements
setmember1d_nu(a,b)  boolean index array for a of set intersection if
a and b with non-unique elements

The new docs http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/
are a bit clearer.

However, I haven't used either of these functions much, and non of
them are *my* functions.
Of the arraysetops functions, I use unique1d most (because of the
return index).
I just keep track of these functions because of the use for
categorical and dummy variables.

Josef

>
> Cheers,
> Alan Isaac
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to