Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-03 Thread josef . pktd
On Wed, Jun 3, 2009 at 8:29 PM, Ning Sean  wrote:
> Hi, I want to extract elements of an array (say, a) that are contained in
> another array (say, b). That is, if a=array([1,1,2,3,3,4]), b=array([1,4]),
> then I want array([1,1,4]).
>
> I did the following but the speed is very slow (maybe because a is very
> long):
>
> c=array([])
> for x in b:
>    c=append(c,a[a==x])
>
> any way to speed it up?
>
> Thanks!
> -Ning
>


It's waiting in Trac for inclusion in numpy
http://projects.scipy.org/numpy/ticket/1036
The current version only handles arrays with unique elements.

You can copy the ticket attachment, the version there is very fast.

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-03 Thread Ning Sean
Thanks! Tried it and it is about twice as fast as my approach.

-Ning

On Wed, Jun 3, 2009 at 7:45 PM,  wrote:

> On Wed, Jun 3, 2009 at 8:29 PM, Ning Sean  wrote:
> > Hi, I want to extract elements of an array (say, a) that are contained in
> > another array (say, b). That is, if a=array([1,1,2,3,3,4]),
> b=array([1,4]),
> > then I want array([1,1,4]).
> >
> > I did the following but the speed is very slow (maybe because a is very
> > long):
> >
> > c=array([])
> > for x in b:
> >c=append(c,a[a==x])
> >
> > any way to speed it up?
> >
> > Thanks!
> > -Ning
> >
>
>
> It's waiting in Trac for inclusion in numpy
> http://projects.scipy.org/numpy/ticket/1036
> The current version only handles arrays with unique elements.
>
> You can copy the ticket attachment, the version there is very fast.
>
> Josef
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Alan G Isaac
a[(a==b[:,None]).sum(axis=0,dtype=bool)]

hth,
Alan Isaac

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Thu, Jun 4, 2009 at 8:23 AM, Alan G Isaac  wrote:
> a[(a==b[:,None]).sum(axis=0,dtype=bool)]

this is my preferred way when b is small and has unique elements.
if the elements in b are not unique, then be can be replaced by np.unique(b)
If b is large this creates a huge intermediate array

The advantage of the new setmember1d_nu is that it handles large b
very efficiently. My try on it was more than 10 times slower than the
proposed solution for larger arrays.

Josef

> hth,
> Alan Isaac
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Alan G Isaac
> On Thu, Jun 4, 2009 at 8:23 AM, Alan G Isaac  wrote:
>> a[(a==b[:,None]).sum(axis=0,dtype=bool)]


On 6/4/2009 8:35 AM josef.p...@gmail.com apparently wrote:
> If b is large this creates a huge intermediate array


True enough, but one could then use fromiter:
setb = set(b)
itr = (ai for ai in a if ai in setb)
out = np.fromiter(itr, dtype=a.dtype)

I suspect (?) that b would have to be pretty
big relative to a for the repeated testing
to be more costly than sorting a.

Or if a stable order is not important (I don't
recall if the OP specified), one could just
np.intersect1d(a, np.unique(b))

On a different note, I think a name change
is needed for your function. (Compare
intersect1d_nu to see the potential
confusion. And btw, what is the use case
for intersect1d, which gives neither a
set intersection nor a multiset intersection?)

Cheers,
Alan Isaac

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Thu, Jun 4, 2009 at 10:13 AM, Alan G Isaac  wrote:
>> On Thu, Jun 4, 2009 at 8:23 AM, Alan G Isaac  wrote:
>>> a[(a==b[:,None]).sum(axis=0,dtype=bool)]
>
>
> On 6/4/2009 8:35 AM josef.p...@gmail.com apparently wrote:
>> If b is large this creates a huge intermediate array
>
>
> True enough, but one could then use fromiter:
> setb = set(b)
> itr = (ai for ai in a if ai in setb)
> out = np.fromiter(itr, dtype=a.dtype)
>
> I suspect (?) that b would have to be pretty
> big relative to a for the repeated testing
> to be more costly than sorting a.

I didn't look at this case very closely for speed, setmember1d and
setmember1d_nu return a boolean array, that can be used for indexing,
not the actual elements.

Your iterator is in python and could be pretty slow, but I only ran
the performance script attached to the ticket and the speed
differences for different ways of doing it were pretty big for large
arrays.

>
> Or if a stable order is not important (I don't
> recall if the OP specified), one could just
> np.intersect1d(a, np.unique(b))

This requires that also `a` has only unique elements.
intersect1d_nu doesn't require unique elements.

>
> On a different note, I think a name change
> is needed for your function. (Compare
> intersect1d_nu to see the potential
> confusion. And btw, what is the use case
> for intersect1d, which gives neither a
> set intersection nor a multiset intersection?)

intersect1d gives set intersection if both arrays have only unique
elements (i.e. are sets).
I thought the naming is pretty clear:

intersect1d(a,b)   set intersection if a and b with unique elements
intersect1d_nu(a,b)   set intersection if a and b with non-unique elements
setmember1d(a,b)  boolean index array for a of set intersection if a
and b with unique elements
setmember1d_nu(a,b)  boolean index array for a of set intersection if
a and b with non-unique elements

The new docs http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/
are a bit clearer.

However, I haven't used either of these functions much, and non of
them are *my* functions.
Of the arraysetops functions, I use unique1d most (because of the
return index).
I just keep track of these functions because of the use for
categorical and dummy variables.

Josef

>
> Cheers,
> Alan Isaac
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Alan G Isaac
> On Thu, Jun 4, 2009 at 10:13 AM, Alan G Isaac  wrote:
>> Or if a stable order is not important (I don't
>> recall if the OP specified), one could just
>> np.intersect1d(a, np.unique(b))

On 6/4/2009 10:50 AM josef.p...@gmail.com apparently wrote:
> This requires that also `a` has only unique elements.
> intersect1d_nu doesn't require unique elements.


>>> a
array([1, 1, 2, 3, 3, 4])
>>> b
array([1, 4])
>>> np.intersect1d(a, np.unique(b))
array([1, 1, 3, 4])

(And thus my question about intersect1d...)

Cheers,
Alan

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Thu, Jun 4, 2009 at 11:12 AM, Alan G Isaac  wrote:
>> On Thu, Jun 4, 2009 at 10:13 AM, Alan G Isaac  wrote:
>>> Or if a stable order is not important (I don't
>>> recall if the OP specified), one could just
>>> np.intersect1d(a, np.unique(b))
>
> On 6/4/2009 10:50 AM josef.p...@gmail.com apparently wrote:
>> This requires that also `a` has only unique elements.
>> intersect1d_nu doesn't require unique elements.
>
>
 a
> array([1, 1, 2, 3, 3, 4])
 b
> array([1, 4])
 np.intersect1d(a, np.unique(b))
> array([1, 1, 3, 4])
>
> (And thus my question about intersect1d...)

Yes, I know, and in my current numpy help file this is the only
example there is, which is very misleading for its intended use.

>>> a = np.array([1, 1, 2, 3, 3, 4])
>>> b = np.array([1, 4, 5])
>>> np.intersect1d(np.unique(a), np.unique(b))
array([1, 4])

>>> np.intersect1d_nu(a,b)
array([1, 4])

Josef

>
> Cheers,
> Alan
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Alan G Isaac
On 6/4/2009 10:50 AM josef.p...@gmail.com apparently wrote:
> intersect1d gives set intersection if both arrays have 
> only unique elements (i.e. are sets).  I thought the 
> naming is pretty clear:

> intersect1d(a,b)   set intersection if a and b with unique elements 
> intersect1d_nu(a,b)   set intersection if a and b with non-unique elements 
> setmember1d(a,b)  boolean index array for a of set intersection if a 
> and b with unique elements 
> setmember1d_nu(a,b)  boolean index array for a of set intersection if 
> a and b with non-unique elements 


>>> a
array([1, 1, 2, 3, 3, 4])
>>> b
array([1, 4, 4, 4])
>>> np.intersect1d_nu(a,b)
array([1, 4])

That is, intersect1d_nu is the actual set intersection
function.  (I.e., intersect1d and intersect1d_nu would most
naturally have swapped names.)  That is why the appended _nu
will not communicate what was intended.  (I.e.,
setmember1d_nu will not be a match for intersect1d_nu.)

Cheers,
Alan Isaac


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Robert Cimrman
Alan G Isaac wrote:
> On 6/4/2009 10:50 AM josef.p...@gmail.com apparently wrote:
>> intersect1d gives set intersection if both arrays have 
>> only unique elements (i.e. are sets).  I thought the 
>> naming is pretty clear:
> 
>> intersect1d(a,b)   set intersection if a and b with unique elements 
>> intersect1d_nu(a,b)   set intersection if a and b with non-unique elements 
>> setmember1d(a,b)  boolean index array for a of set intersection if a 
>> and b with unique elements 
>> setmember1d_nu(a,b)  boolean index array for a of set intersection if 
>> a and b with non-unique elements 
> 
> 
 a
> array([1, 1, 2, 3, 3, 4])
 b
> array([1, 4, 4, 4])
 np.intersect1d_nu(a,b)
> array([1, 4])
> 
> That is, intersect1d_nu is the actual set intersection
> function.  (I.e., intersect1d and intersect1d_nu would most
> naturally have swapped names.)  That is why the appended _nu
> will not communicate what was intended.  (I.e.,
> setmember1d_nu will not be a match for intersect1d_nu.)

The naming should express this: intersect1d expects its arguments are 
sets, intersect1d_nu does not. A set has unique elements by definition.

cheers,
r.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Thu, Jun 4, 2009 at 11:19 AM, Alan G Isaac  wrote:
> On 6/4/2009 10:50 AM josef.p...@gmail.com apparently wrote:
>> intersect1d gives set intersection if both arrays have
>> only unique elements (i.e. are sets).  I thought the
>> naming is pretty clear:
>
>> intersect1d(a,b)   set intersection if a and b with unique elements
>> intersect1d_nu(a,b)   set intersection if a and b with non-unique elements
>> setmember1d(a,b)  boolean index array for a of set intersection if a
>> and b with unique elements
>> setmember1d_nu(a,b)  boolean index array for a of set intersection if
>> a and b with non-unique elements
>
>
 a
> array([1, 1, 2, 3, 3, 4])
 b
> array([1, 4, 4, 4])
 np.intersect1d_nu(a,b)
> array([1, 4])
>
> That is, intersect1d_nu is the actual set intersection
> function.  (I.e., intersect1d and intersect1d_nu would most
> naturally have swapped names.)  That is why the appended _nu
> will not communicate what was intended.  (I.e.,
> setmember1d_nu will not be a match for intersect1d_nu.)

intersect1d  is the intersection between sets (which are stored as
arrays), just like in the mathematical definition the two sets only
have unique elements

intersect1d_nu is the intersection between two arrays which can have
repeated elements. The result is a set, i.e. unique elements, stored
as an array

same for setmember1d, setmember1d_nu

so  postfix `_nu` only means that this function also works if the two
arrays are not really sets, i.e. are not required to have unique
elements to make sense.


intersect1d should throw a domain error if you give it arrays with
non-unique elements, which is not done for speed reasons


> Cheers,
> Alan Isaac
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Alan G Isaac
On 6/4/2009 11:29 AM josef.p...@gmail.com apparently wrote:
> intersect1d  is the intersection between sets (which are stored as 
> arrays), just like in the mathematical definition the two sets only 
> have unique elements 

Hmmm. OK, I see you and Robert believe this.
But it does not match the documentation.
But indeed, I see that the documentation is incorrect.
E.g.,

>>> np.intersect1d([1,1,2,3,3,4],[1,4])
array([1, 1, 3, 4])

Is this a bug or a documentation bug?



> intersect1d_nu is the intersection between two arrays which can have 
> repeated elements. The result is a set, i.e. unique elements, stored 
> as an array 

> same for setmember1d, setmember1d_nu 

I cannot understand this.
Following your proposed reasoning,
I expect a[setmember1d_nu(a,b)]
to return the same as
intersect1d_nu(a, b).
It does not.



> so  postfix `_nu` only means that this function also works 
> if the two arrays are not really sets

But that just begs the question: what does 'works' mean?
See my previous comment (above).



> intersect1d should throw a domain error if you give it arrays with 
> non-unique elements, which is not done for speed reasons 

*If* intersect1d behaved *exactly* as documented,
the example
intersect1d(a, np.unique(b))
shows that the documented behavior can be useful.
And indeed, this would be the match to
a[setmember1d_nu(a,b)]

Cheers,
Alan Isaac


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Thu, Jun 4, 2009 at 12:32 PM, Alan G Isaac  wrote:
> On 6/4/2009 11:29 AM josef.p...@gmail.com apparently wrote:
>> intersect1d  is the intersection between sets (which are stored as
>> arrays), just like in the mathematical definition the two sets only
>> have unique elements
>
> Hmmm. OK, I see you and Robert believe this.
> But it does not match the documentation.
> But indeed, I see that the documentation is incorrect.
> E.g.,
>
 np.intersect1d([1,1,2,3,3,4],[1,4])
> array([1, 1, 3, 4])
>
> Is this a bug or a documentation bug?
>
>
>
>> intersect1d_nu is the intersection between two arrays which can have
>> repeated elements. The result is a set, i.e. unique elements, stored
>> as an array
>
>> same for setmember1d, setmember1d_nu
>
> I cannot understand this.
> Following your proposed reasoning,
> I expect a[setmember1d_nu(a,b)]
> to return the same as
> intersect1d_nu(a, b).
> It does not.

I don't have setmember1d_nu available right now, but from my reading
we should have

 intersect1d_nu(a, b).== np.unique(a[setmember1d_nu(a,b)])


>
>
>
>> so  postfix `_nu` only means that this function also works
>> if the two arrays are not really sets
>
> But that just begs the question: what does 'works' mean?
> See my previous comment (above).
>
>
>
>> intersect1d should throw a domain error if you give it arrays with
>> non-unique elements, which is not done for speed reasons
>
> *If* intersect1d behaved *exactly* as documented,
> the example
> intersect1d(a, np.unique(b))
> shows that the documented behavior can be useful.
> And indeed, this would be the match to
> a[setmember1d_nu(a,b)]

I'm don't know if anyone looked at the behavior for "unintented" usage

intersect1d  rearranges, sorts
>>> np.intersect1d([4,1,3,3],[3,4])
array([3, 3, 4])

but it gives you the correct multiplicity
>>> np.intersect1d([4,4,4,1,3,3],np.unique([3,4,3,0]))
array([3, 3, 4, 4, 4])

so I guess, we have
np.intersect1d([4,4,4,1,3,3], np.unique([3,4,3,0])) ==
np.sort(a[setmember1d_nu(a,b)])

for the example from the help file I don't find any meaningful interpretation
>>> np.intersect1d([1,3,3],[3,1,1])
array([1, 1, 3, 3])


wrong answer
>>> np.setmember1d([4,1,1,3,3],[3,4])
array([ True,  True, False,  True,  True], dtype=bool)

Note: there are two versions of the docs for np.intersect1d, the
currently published docs which describe the actual behavior (for the
non-unique case), and the new docs on the doc editor
http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/
that describe the "intended" usage of the functions, which also
corresponds closer to the original source docstring
(http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/?revision=-227
). that's my interpretation

If you think that functions make sense also for the "unintended"
usage, then you could add an example to the new docs.

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Alan G Isaac
On 6/4/2009 1:27 PM josef.p...@gmail.com apparently wrote:
> Note: there are two versions of the docs for np.intersect1d, the
> currently published docs which describe the actual behavior (for the
> non-unique case), and the new docs on the doc editor
> http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/
> that describe the "intended" usage of the functions, which also
> corresponds closer to the original source docstring
> (http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/?revision=-227
> ). that's my interpretation


Again, the distributed docs do *not* describe the actual
behavior for the non-unique case.  E.g.,

>>> np.intersect1d([1,1,2,3,3,4], [1,4])
array([1, 1, 3, 4])

Might this is a better example of
failure than the one in the doc editor?

However the doc editor version states that the function
fails for the non-unique case, so it seems there was a
documentation bug that is in the process of being fixed.

Thanks,
Alan

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Thu, Jun 4, 2009 at 2:58 PM, Alan G Isaac  wrote:
> On 6/4/2009 1:27 PM josef.p...@gmail.com apparently wrote:
>> Note: there are two versions of the docs for np.intersect1d, the
>> currently published docs which describe the actual behavior (for the
>> non-unique case), and the new docs on the doc editor
>> http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/
>> that describe the "intended" usage of the functions, which also
>> corresponds closer to the original source docstring
>> (http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/?revision=-227
>> ). that's my interpretation
>
>
> Again, the distributed docs do *not* describe the actual
> behavior for the non-unique case.  E.g.,
>
 np.intersect1d([1,1,2,3,3,4], [1,4])
> array([1, 1, 3, 4])
>
> Might this is a better example of
> failure than the one in the doc editor?

Thanks, that's a very clear example of a wrong answer,
and it removes the question whether the function makes any sense for
the non-unique case.
I changed the example in the doc editor to this one.

It will hopefully merged with the source at the next update.

Josef


>
> However the doc editor version states that the function
> fails for the non-unique case, so it seems there was a
> documentation bug that is in the process of being fixed.

Yes

>
> Thanks,
> Alan
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Kim Hansen
Concerning the name setmember1d_nu, I personally find it quite verbose
and not the name I would expect as a non-insider coming to numpy and
not knowing all the names of the more special hidden-away functions
and not being a python-wiz either.

I think ain(a,b) would be the name I had expected as an array
equivalent of "a in b" (just as arange is the array version of range)
or I would had anticipated that an ndarray object would have an
"in(b)" or "in_iterable(b)" method, such that you could do a.in(b)
which would return a boolean array of the same shape as a with
elements true if the equivalent a members were members in the iterable
b.

When I had a problem where I needed this function, I could not find
anything near that, and after looking around and also asking here I
got some hints to use the 1d functions, which gave me the idea to
implement the few-line, very simple proposal for "a in b", which is
now the proposal under review as the new function setmember1d_nu(a,b).
Whereas I see this function name is in line with the existing
functions, I really think the names are non-intuitive. I would
therefore propose that it was also aliased to a more intuitive name
such as ain(a,b) or perhaps better a.in(b)

Again, I am probably missing some important points here as a
non-experienced Python programmer and numpy user, I am just trying to
give some input from the beginners point-of-view, if that can be of
any help.

Thank you,

Kim
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Gael Varoquaux
On Thu, Jun 04, 2009 at 10:27:11PM +0200, Kim Hansen wrote:
> "in(b)" or "in_iterable(b)" method, such that you could do a.in(b)
> which would return a boolean array of the same shape as a with
> elements true if the equivalent a members were members in the iterable
> b.

That would really by what I would be looking for.

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Anne Archibald
2009/6/4  :

> intersect1d should throw a domain error if you give it arrays with
> non-unique elements, which is not done for speed reasons

It seems to me that this is the basic source of the problem. Perhaps
this can be addressed? I realize maintaining compatibility with the
current behaviour is necessary, so how about a multistage deprecation:

1. add a keyword argument to intersect1d "assume_unique"; if it is not
present, check for uniqueness and emit a warning if not unique
2. change the warning to an exception
Optionally:
3. change the meaning of the function to that of intersect1d_nu if the
keyword argument is not present

One could do something similar with setmember1d.

This would remove the pitfall of the 1d assumption and the wart of the
_nu names without hampering performance for people who know they have
unique arrays and are in a hurry.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Thu, Jun 4, 2009 at 4:30 PM, Gael Varoquaux
 wrote:
> On Thu, Jun 04, 2009 at 10:27:11PM +0200, Kim Hansen wrote:
>> "in(b)" or "in_iterable(b)" method, such that you could do a.in(b)
>> which would return a boolean array of the same shape as a with
>> elements true if the equivalent a members were members in the iterable
>> b.
>
> That would really by what I would be looking for.
>

Just using "in" might promise more than it does, eg. it works only for
one dimensional arrays, maybe "in1d". With "in", I would expect a
generic function as in python that works with many array types and
dimensions. (But I haven't checked whether it would work with a 1d
structured array or object array.)

I found arraysetops because of unique1d, but I didn't figure out what
the subpackage really does, because I was reading "arrayse-tops"
instead of array-set-ops"

BTW, for the docs, I haven't found a counter example where
np.setdiff1d gives the wrong answer for non-unique arrays.

Josef
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Gael Varoquaux
On Thu, Jun 04, 2009 at 04:43:39PM -0400, josef.p...@gmail.com wrote:
> Just using "in" might promise more than it does, eg. it works only for
> one dimensional arrays, maybe "in1d". With "in", 

Then 'in_1d'

> I found arraysetops because of unique1d, but I didn't figure out what
> the subpackage really does, because I was reading "arrayse-tops"
> instead of array-set-ops"

That's why I push people to use more underscores. IMHO PEP8 lacks a push
for underscores.

Gaël
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Thu, Jun 4, 2009 at 4:52 PM, Gael Varoquaux
 wrote:
> On Thu, Jun 04, 2009 at 04:43:39PM -0400, josef.p...@gmail.com wrote:
>> Just using "in" might promise more than it does, eg. it works only for
>> one dimensional arrays, maybe "in1d". With "in",
>
> Then 'in_1d'

No, if the breaks in a name are obvious, I still prefer names without
underscores. I don't think `1d` or `2d` needs to be separated from the
word, "in1d"
I always remember how to spell unique1d, but I usually have to check
how to spell at_least_2d, or maybe atleast_2d or even atleast2d.

how about

def setmember1d_nu(a, b):
...

#aliases
set_member_1d_but_it_does_not_really_have_to_be_a_set = setmember1d_nu
in1d = setmember1d_nu

Josef

>>> [f for f in dir(np) if f[-2:]=='1d' or f[-2:]=='2d']
['atleast_1d', 'atleast_2d', 'ediff1d', 'histogram2d', 'intersect1d',
'poly1d', 'setdiff1d', 'setmember1d', 'setxor1d', 'union1d',
'unique1d']

>>> [f for f in dir(scipy.signal) if f[-2:]=='1d' or f[-2:]=='2d']
['atleast_1d', 'atleast_2d', 'convolve2d', 'correlate2d', 'cspline1d',
'cspline2d', 'medfilt2d', 'qspline1d', 'qspline2d', 'sepfir2d']
>>>
>>> [f for f in dir(scipy.stats) if f[-2:]=='1d' or f[-2:]=='2d']
[]
>>>
>>> [f for f in dir(scipy.ndimage) if f[-2:]=='1d' or f[-2:]=='2d']
['convolve1d', 'correlate1d', 'gaussian_filter1d', 'generic_filter1d',
'maximum_filter1d', 'minimum_filter1d', 'spline_filter1d',
'uniform_filter1d']


>
>> I found arraysetops because of unique1d, but I didn't figure out what
>> the subpackage really does, because I was reading "arrayse-tops"
>> instead of array-set-ops"
>
> That's why I push people to use more underscores. IMHO PEP8 lacks a push
> for underscores.
>
> Gaël
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Robert Cimrman
josef.p...@gmail.com wrote:
> On Thu, Jun 4, 2009 at 2:58 PM, Alan G Isaac  wrote:
>> On 6/4/2009 1:27 PM josef.p...@gmail.com apparently wrote:
>>> Note: there are two versions of the docs for np.intersect1d, the
>>> currently published docs which describe the actual behavior (for the
>>> non-unique case), and the new docs on the doc editor
>>> http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/
>>> that describe the "intended" usage of the functions, which also
>>> corresponds closer to the original source docstring
>>> (http://docs.scipy.org/numpy/docs/numpy.lib.arraysetops.intersect1d/?revision=-227
>>> ). that's my interpretation
>>
>> Again, the distributed docs do *not* describe the actual
>> behavior for the non-unique case.  E.g.,
>>
> np.intersect1d([1,1,2,3,3,4], [1,4])
>> array([1, 1, 3, 4])
>>
>> Might this is a better example of
>> failure than the one in the doc editor?
> 
> Thanks, that's a very clear example of a wrong answer,
> and it removes the question whether the function makes any sense for
> the non-unique case.
> I changed the example in the doc editor to this one.
> 
> It will hopefully merged with the source at the next update.

Thank you Josef!

r.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Robert Cimrman
Kim Hansen wrote:
> Concerning the name setmember1d_nu, I personally find it quite verbose
> and not the name I would expect as a non-insider coming to numpy and
> not knowing all the names of the more special hidden-away functions
> and not being a python-wiz either.

To explain the naming: those names are used in matlab for functions of 
similar functionality. If better names are found, I am not against.

What I particularly do not like is the _nu suffix (yes, blame me).

r.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Robert Cimrman
Anne Archibald wrote:
> 2009/6/4  :
> 
>> intersect1d should throw a domain error if you give it arrays with
>> non-unique elements, which is not done for speed reasons
> 
> It seems to me that this is the basic source of the problem. Perhaps
> this can be addressed? I realize maintaining compatibility with the
> current behaviour is necessary, so how about a multistage deprecation:
> 
> 1. add a keyword argument to intersect1d "assume_unique"; if it is not
> present, check for uniqueness and emit a warning if not unique
> 2. change the warning to an exception
> Optionally:
> 3. change the meaning of the function to that of intersect1d_nu if the
> keyword argument is not present
> 
> One could do something similar with setmember1d.
> 
> This would remove the pitfall of the 1d assumption and the wart of the
> _nu names without hampering performance for people who know they have
> unique arrays and are in a hurry.

You mean something like:

def intersect1d(ar1, ar2, assume_unique=False):
 if not assume_unique:
 return intersect1d_nu(ar1, ar2)
 else:
 ... # the current code

intersect1d_nu could be still exported to numpy namespace, or not.

I like this. I do not undestand, however, what you mean by "remove the 
pitfall of the 1d assumption"?

cheers,
r.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Robert Cimrman
josef.p...@gmail.com wrote:
> On Thu, Jun 4, 2009 at 4:30 PM, Gael Varoquaux
>  wrote:
>> On Thu, Jun 04, 2009 at 10:27:11PM +0200, Kim Hansen wrote:
>>> "in(b)" or "in_iterable(b)" method, such that you could do a.in(b)
>>> which would return a boolean array of the same shape as a with
>>> elements true if the equivalent a members were members in the iterable
>>> b.
>> That would really by what I would be looking for.
>>
> 
> Just using "in" might promise more than it does, eg. it works only for
> one dimensional arrays, maybe "in1d". With "in", I would expect a
> generic function as in python that works with many array types and
> dimensions. (But I haven't checked whether it would work with a 1d
> structured array or object array.)
> 
> I found arraysetops because of unique1d, but I didn't figure out what
> the subpackage really does, because I was reading "arrayse-tops"
> instead of array-set-ops"

I am bad in choosing names, but note that numpy sub-modules usually do 
not use underscores, so array_set_ops would not fit well.

> BTW, for the docs, I haven't found a counter example where
> np.setdiff1d gives the wrong answer for non-unique arrays.

In [4]: np.setmember1d( [1, 1, 2, 4, 2], [3, 2, 4] )
Out[4]: array([ True, False,  True,  True,  True], dtype=bool)

r.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread josef . pktd
On Fri, Jun 5, 2009 at 1:48 AM, Robert Cimrman  wrote:
> josef.p...@gmail.com wrote:
>> On Thu, Jun 4, 2009 at 4:30 PM, Gael Varoquaux
>>  wrote:
>>> On Thu, Jun 04, 2009 at 10:27:11PM +0200, Kim Hansen wrote:
 "in(b)" or "in_iterable(b)" method, such that you could do a.in(b)
 which would return a boolean array of the same shape as a with
 elements true if the equivalent a members were members in the iterable
 b.
>>> That would really by what I would be looking for.
>>>
>>
>> Just using "in" might promise more than it does, eg. it works only for
>> one dimensional arrays, maybe "in1d". With "in", I would expect a
>> generic function as in python that works with many array types and
>> dimensions. (But I haven't checked whether it would work with a 1d
>> structured array or object array.)
>>
>> I found arraysetops because of unique1d, but I didn't figure out what
>> the subpackage really does, because I was reading "arrayse-tops"
>> instead of array-set-ops"
>
> I am bad in choosing names, but note that numpy sub-modules usually do
> not use underscores, so array_set_ops would not fit well.

I would have chosen something like setfun.  Since this is in numpy
that sets refers to arrays should be implied.

>
>> BTW, for the docs, I haven't found a counter example where
>> np.setdiff1d gives the wrong answer for non-unique arrays.
>
> In [4]: np.setmember1d( [1, 1, 2, 4, 2], [3, 2, 4] )
> Out[4]: array([ True, False,  True,  True,  True], dtype=bool)

setdiff1ddiff  not  member
Looking at the source, I think setdiff always works even if for
non-unique arrays.

Josef

>
> r.
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-04 Thread Robert Cimrman
josef.p...@gmail.com wrote:
> On Fri, Jun 5, 2009 at 1:48 AM, Robert Cimrman  wrote:
>> josef.p...@gmail.com wrote:
>>> On Thu, Jun 4, 2009 at 4:30 PM, Gael Varoquaux
>>>  wrote:
 On Thu, Jun 04, 2009 at 10:27:11PM +0200, Kim Hansen wrote:
> "in(b)" or "in_iterable(b)" method, such that you could do a.in(b)
> which would return a boolean array of the same shape as a with
> elements true if the equivalent a members were members in the iterable
> b.
 That would really by what I would be looking for.

>>> Just using "in" might promise more than it does, eg. it works only for
>>> one dimensional arrays, maybe "in1d". With "in", I would expect a
>>> generic function as in python that works with many array types and
>>> dimensions. (But I haven't checked whether it would work with a 1d
>>> structured array or object array.)
>>>
>>> I found arraysetops because of unique1d, but I didn't figure out what
>>> the subpackage really does, because I was reading "arrayse-tops"
>>> instead of array-set-ops"
>> I am bad in choosing names, but note that numpy sub-modules usually do
>> not use underscores, so array_set_ops would not fit well.
> 
> I would have chosen something like setfun.  Since this is in numpy
> that sets refers to arrays should be implied.

Yes, good idea. I am not sure how to proceed, if people agree (name 
contest is open!) What about making an alias name setfun, and deprecate 
the name arraysetops?

>>> BTW, for the docs, I haven't found a counter example where
>>> np.setdiff1d gives the wrong answer for non-unique arrays.
>> In [4]: np.setmember1d( [1, 1, 2, 4, 2], [3, 2, 4] )
>> Out[4]: array([ True, False,  True,  True,  True], dtype=bool)
> 
> setdiff1ddiff  not  member
> Looking at the source, I think setdiff always works even if for
> non-unique arrays.

Whoops, sorry. setdiff1d seems really to work for non-unique arrays - it 
relies on the behaviour above though :) - there is always one correct 
False even for repeated entries in the first array.

r.


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-05 Thread David Warde-Farley
On 4-Jun-09, at 4:38 PM, Anne Archibald wrote:

> It seems to me that this is the basic source of the problem. Perhaps
> this can be addressed? I realize maintaining compatibility with the
> current behaviour is necessary, so how about a multistage deprecation:
>
> 1. add a keyword argument to intersect1d "assume_unique"; if it is not
> present, check for uniqueness and emit a warning if not unique
> 2. change the warning to an exception
> Optionally:
> 3. change the meaning of the function to that of intersect1d_nu if the
> keyword argument is not present
>
> One could do something similar with setmember1d.

+1 on this idea. I've been bitten by the non-unique stuff in the past,  
especially with setmember1d, not realizing that both need to be unique.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-06 Thread Neil Crighton
Robert Cimrman  ntc.zcu.cz> writes:

> Anne Archibald wrote:
>
> > 1. add a keyword argument to intersect1d "assume_unique"; if it is not
> > present, check for uniqueness and emit a warning if not unique
> > 2. change the warning to an exception
> > Optionally:
> > 3. change the meaning of the function to that of intersect1d_nu if the
> > keyword argument is not present
> > 
> You mean something like:
> 
> def intersect1d(ar1, ar2, assume_unique=False):
>  if not assume_unique:
>  return intersect1d_nu(ar1, ar2)
>  else:
>  ... # the current code
> 
> intersect1d_nu could be still exported to numpy namespace, or not.
> 

+1 - from the user's point of view there should just be intersect1d and
setmember1d (i.e. no '_nu' versions). The assume_unique keyword Robert suggests
can be used if speed is a problem.

I really like in1d (no underscore) as a new name for setmember1d_nu. inarray is
another possibility. I don't like 'ain'; 'a' in front of 'in' detracts from 
readability, unlike the extra a in arange.

Can we summarise the discussion in this thread and write up a short proposal
about what we'd like to change in arraysetops, and how to make the changes? 
Then it's easy for other people to give their opinion on any changes. I can do
this if no one else has time.


Neil


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-06 Thread josef . pktd
On Sat, Jun 6, 2009 at 4:42 AM, Neil Crighton  wrote:
> Robert Cimrman  ntc.zcu.cz> writes:
>
>> Anne Archibald wrote:
>>
>> > 1. add a keyword argument to intersect1d "assume_unique"; if it is not
>> > present, check for uniqueness and emit a warning if not unique
>> > 2. change the warning to an exception
>> > Optionally:
>> > 3. change the meaning of the function to that of intersect1d_nu if the
>> > keyword argument is not present
>> >

1. merge _nu version into one function
---

>> You mean something like:
>>
>> def intersect1d(ar1, ar2, assume_unique=False):
>>      if not assume_unique:
>>          return intersect1d_nu(ar1, ar2)
>>      else:
>>          ... # the current code
>>
>> intersect1d_nu could be still exported to numpy namespace, or not.
>>
>
> +1 - from the user's point of view there should just be intersect1d and
> setmember1d (i.e. no '_nu' versions). The assume_unique keyword Robert 
> suggests
> can be used if speed is a problem.

+ 1 on rolling the _nu versions this way into the plain version, this
would avoid a lot of the confusion.
It would not be a code breaking API change for existing correct usage
(but some speed regression without adding keyword)

depreciate intersect1d_nu
^^
> intersect1d_nu could be still exported to numpy namespace, or not.
I would say not, if they are the default branch of the non _nu version

+1 on depreciation


2. alias as "in"
-
>
> I really like in1d (no underscore) as a new name for setmember1d_nu. inarray 
> is
> another possibility. I don't like 'ain'; 'a' in front of 'in' detracts from
> readability, unlike the extra a in arange.
I don't like the extra "a"s either, ones name spaces are commonly used

alias setmember1d_nu as `in1d` or `isin1d`, because the function is a
"in" and not a set operation
+1

>
> Can we summarise the discussion in this thread and write up a short proposal
> about what we'd like to change in arraysetops, and how to make the changes?
> Then it's easy for other people to give their opinion on any changes. I can do
> this if no one else has time.
>

 other points

3. behavior of other set functions
---

guarantee that setdiff1d works for non-unique arrays (even when
implementation changes), and change documentation
+1

need to check other functions
^^
union1d:  works for non-unique arrays, obvious from source

setxor1d: requires unique arrays
>>> np.setxor1d([1,2,3,3,4,5], [0,0,1,2,2,6])
array([2, 4, 5, 6])
>>> np.setxor1d(np.unique([1,2,3,3,4,5]), np.unique([0,0,1,2,2,6]))
array([0, 3, 4, 5, 6])

setxor: add keyword option and call unique by default
+1 for symmetry

ediff1d and unique1d are defined for non-unique arrays


4. name of keyword


intersect1d(ar1, ar2, assume_unique=False)

alternative isunique=False  or just unique=False
+1 less to write


5. module name
---

rename arraysetops to something easier to read like setfun. I think it
would only affect internal changes since all functions are exported to
the main numpy name space
+1e-4  (I got used to arrayse_tops)


5. keep docs in sync with correct usage
-

obvious


That's my summary and opinions

Josef

>
> Neil
>
>
> ___
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-08 Thread Robert Cimrman
Hi Josef,

thanks for the summary! I am responding below, later I will make an 
enhancement ticket.

josef.p...@gmail.com wrote:
> On Sat, Jun 6, 2009 at 4:42 AM, Neil Crighton  wrote:
>> Robert Cimrman  ntc.zcu.cz> writes:
>>
>>> Anne Archibald wrote:
>>>
 1. add a keyword argument to intersect1d "assume_unique"; if it is not
 present, check for uniqueness and emit a warning if not unique
 2. change the warning to an exception
 Optionally:
 3. change the meaning of the function to that of intersect1d_nu if the
 keyword argument is not present

> 
> 1. merge _nu version into one function
> ---
> 
>>> You mean something like:
>>>
>>> def intersect1d(ar1, ar2, assume_unique=False):
>>>  if not assume_unique:
>>>  return intersect1d_nu(ar1, ar2)
>>>  else:
>>>  ... # the current code
>>>
>>> intersect1d_nu could be still exported to numpy namespace, or not.
>>>
>> +1 - from the user's point of view there should just be intersect1d and
>> setmember1d (i.e. no '_nu' versions). The assume_unique keyword Robert 
>> suggests
>> can be used if speed is a problem.
> 
> + 1 on rolling the _nu versions this way into the plain version, this
> would avoid a lot of the confusion.
> It would not be a code breaking API change for existing correct usage
> (but some speed regression without adding keyword)

+1

> depreciate intersect1d_nu
> ^^
>> intersect1d_nu could be still exported to numpy namespace, or not.
> I would say not, if they are the default branch of the non _nu version
> 
> +1 on depreciation

+0

> 2. alias as "in"
> -
>> I really like in1d (no underscore) as a new name for setmember1d_nu. inarray 
>> is
>> another possibility. I don't like 'ain'; 'a' in front of 'in' detracts from
>> readability, unlike the extra a in arange.
> I don't like the extra "a"s either, ones name spaces are commonly used
> 
> alias setmember1d_nu as `in1d` or `isin1d`, because the function is a
> "in" and not a set operation
> +1

+1

> 3. behavior of other set functions
> ---
> 
> guarantee that setdiff1d works for non-unique arrays (even when
> implementation changes), and change documentation
> +1

+1, it is useful for non-unique arrays.

> need to check other functions
> ^^
> union1d:  works for non-unique arrays, obvious from source

Yes.

> setxor1d: requires unique arrays
 np.setxor1d([1,2,3,3,4,5], [0,0,1,2,2,6])
> array([2, 4, 5, 6])
 np.setxor1d(np.unique([1,2,3,3,4,5]), np.unique([0,0,1,2,2,6]))
> array([0, 3, 4, 5, 6])
> 
> setxor: add keyword option and call unique by default
> +1 for symmetry

+1 - you mean np.setxor1d(np.unique(a), np.unique(b)) to become 
np.setxor1d(a, b, assume_unique=False), right?

> ediff1d and unique1d are defined for non-unique arrays

yes

> 4. name of keyword
> 
> 
> intersect1d(ar1, ar2, assume_unique=False)
> 
> alternative isunique=False  or just unique=False
> +1 less to write

We should look at other functions in numpy (and/or scipy), what is a 
common scheme here. -1e-1 to the proposed names, as isunique is singular 
only, and unique=False does not show clearly the intent for me. What 
about ar1_unique=False, ar2_unique=False - to address each argument 
specifically?

> 5. module name
> ---
> 
> rename arraysetops to something easier to read like setfun. I think it
> would only affect internal changes since all functions are exported to
> the main numpy name space
> +1e-4  (I got used to arrayse_tops)

+0 (internal change only). Other numpy/scipy submodules containing a 
bunch of functions are called *pack (fftpack, arpack, lapack), *alg 
(linalg), *utils. *fun is used comonly in the matlab world.

> 5. keep docs in sync with correct usage
> -
> 
> obvious

+1

thanks,
r.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] extract elements of an array that are contained in another array?

2009-06-08 Thread Robert Cimrman
Robert Cimrman wrote:
> Hi Josef,
> 
> thanks for the summary! I am responding below, later I will make an 
> enhancement ticket.

Done, see http://projects.scipy.org/numpy/ticket/1133
r.
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion