2009/7/4 Steven D'Aprano <st...@remove-this-cybersource.com.au>: > On Sat, 04 Jul 2009 13:42:06 +0000, Steven D'Aprano wrote: > >> On Sat, 04 Jul 2009 10:55:44 +0100, Vilya Harvey wrote: >> >>> 2009/7/4 Andre Engels <andreeng...@gmail.com>: >>>> On Sat, Jul 4, 2009 at 9:33 AM, mclovin<hanoo...@gmail.com> wrote: >>>>> Currently I need to find the most common elements in thousands of >>>>> arrays within one large array (arround 2 million instances with ~70k >>>>> unique elements) >> ... >>>> There's no better algorithm for the general case. No method of >>>> checking the matrices using less than 2000000-x look-ups will ensure >>>> you that there's not a new value with x occurences lurking somewhere. >>> >>> Try flattening the arrays into a single large array & sorting it. Then >>> you can just iterate over the large array counting as you go; you only >>> ever have to insert into the dict once for each value and there's no >>> lookups in the dict. >> >> You're suggesting to do a whole bunch of work copying 2,000,000 pointers >> into a single array, then a whole bunch of more work sorting that second >> array (which is O(N*log N) on average), and then finally iterate over >> the second array. Sure, that last step will on average involve fewer >> than O(N) steps, > > Er what? > > Ignore that last comment -- I don't know what I was thinking. You still > have to iterate over all N elements, sorted or not. > >> but to get to that point you've already done more work >> than just iterating over the array-of-arrays in the first place. > > What it does buy you though, as you pointed out, is reducing the number > of explicit dict lookups and writes. However, dict lookups and writes are > very fast, fast enough that they're used throughout Python. A line like: > > count += 1 > > actually is a dict lookup and write.
I did some tests, just to be sure, and you're absolutely right: just creating the flattened list took several hundred (!) times as long as iterating through all the lists in place. Live and learn... Vil. -- http://mail.python.org/mailman/listinfo/python-list