On Sat, Jul 4, 2009 at 12:33 AM, mclovin<hanoo...@gmail.com> wrote: > Currently I need to find the most common elements in thousands of > arrays within one large array (arround 2 million instances with ~70k > unique elements) > > so I set up a dictionary to handle the counting so when I am > iterating I up the count on the corrosponding dictionary element. I > then iterate through the dictionary and find the 25 most common > elements. > > the elements are initially held in a array within an array. so I am am > just trying to find the most common elements between all the arrays > contained in one large array. > my current code looks something like this: > d = {} > for arr in my_array: > -----for i in arr: > #elements are numpy integers and thus are not accepted as dictionary > keys > -----------d[int(i)]=d.get(int(i),0)+1 > > then I filter things down. but with my algorithm that only takes about > 1 sec so I dont need to show it here since that isnt the problem. > > > But there has to be something better. I have to do this many many > times and it seems silly to iterate through 2 million things just to > get 25. The element IDs are integers and are currently being held in > numpy arrays in a larger array. this ID is what makes up the key to > the dictionary. > > It currently takes about 5 seconds to accomplish this with my current > algorithm. > > So does anyone know the best solution or algorithm? I think the trick > lies in matrix intersections but I do not know.
Using a defaultdict (http://docs.python.org/library/collections.html#collections.defaultdict) would speed it up (but only slightly) and make it clearer to read. Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list