finding most common elements between thousands of multiple arrays.

mclovin Sat, 04 Jul 2009 00:36:26 -0700

Currently I need to find the most common elements in thousands of
arrays within one large array (arround 2 million instances with ~70k
unique elements)


so I set up a dictionary to handle the counting so when I am
iterating  I up the count on the corrosponding dictionary element. I
then iterate through the dictionary and find the 25 most common
elements.

the elements are initially held in a array within an array. so I am am
just trying to find the most common elements between all the arrays
contained in one large array.
my current code looks something like this:
d = {}
for arr in my_array:
-----for i in arr:
#elements are numpy integers and thus are not accepted as dictionary
keys
-----------d[int(i)]=d.get(int(i),0)+1

then I filter things down. but with my algorithm that only takes about
1 sec so I dont need to show it here since that isnt the problem.


But there has to be something better. I have to do this many many
times and it seems silly to iterate through 2 million things just to
get 25. The element IDs are integers and are currently being held in
numpy arrays in a larger array. this ID is what makes up the key to
the dictionary.

 It currently takes about 5 seconds to accomplish this with my current
algorithm.

So does anyone know the best solution or algorithm? I think the trick
lies in matrix intersections but I do not know.
-- 
http://mail.python.org/mailman/listinfo/python-list

finding most common elements between thousands of multiple arrays.

Reply via email to