Currently I need to find the most common elements in thousands of arrays within one large array (arround 2 million instances with ~70k unique elements)
so I set up a dictionary to handle the counting so when I am iterating I up the count on the corrosponding dictionary element. I then iterate through the dictionary and find the 25 most common elements. the elements are initially held in a array within an array. so I am am just trying to find the most common elements between all the arrays contained in one large array. my current code looks something like this: d = {} for arr in my_array: -----for i in arr: #elements are numpy integers and thus are not accepted as dictionary keys -----------d[int(i)]=d.get(int(i),0)+1 then I filter things down. but with my algorithm that only takes about 1 sec so I dont need to show it here since that isnt the problem. But there has to be something better. I have to do this many many times and it seems silly to iterate through 2 million things just to get 25. The element IDs are integers and are currently being held in numpy arrays in a larger array. this ID is what makes up the key to the dictionary. It currently takes about 5 seconds to accomplish this with my current algorithm. So does anyone know the best solution or algorithm? I think the trick lies in matrix intersections but I do not know. -- http://mail.python.org/mailman/listinfo/python-list