Re: Efficiently removing duplicate rows from a 2-dimensional Numeric array

Matt McCredie Thu, 19 Jul 2007 16:58:56 -0700

Could you use a set of tuples?

set([(1,2),(1,3),(1,2),(2,3)])

set([(1, 2), (1, 3), (2, 3)])


Matt

On 7/19/07, Alex Mont <[EMAIL PROTECTED]> wrote:


 I have a 2-dimensional Numeric array with the shape (2,N) and I want to
remove all duplicate rows from the array. For example if I start out with:

[[1,2],

[1,3],

[1,2],

[2,3]]



I want to end up with

[[1,2],

[1,3],

[2,3]].



(Order of the rows doesn't matter, although order of the two elements in
each row does.)



The problem is that I can't find any way of doing this that is efficient
with large data sets (in the data set I am using, N > 1000000)

The normal method of removing duplicates by putting the elements into a
dictionary and then reading off the keys doesn't work directly because the
keys – rows of Python arrays – aren't hashable.

The best I have been able to do so far is:



def remove_duplicates(x):

                d = {}

                for (a,b) in x:

                                d[(a,b)] = (a,b)

                return array(x.values())



According to the profiler the loop takes about 7 seconds and the call to
array() 10 seconds with N=1,700,000.



Is there a faster way to do this using Numeric?



-Alex Mont

--
http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Efficiently removing duplicate rows from a 2-dimensional Numeric array

Reply via email to