Roy Smith wrote: > Wow. > > I was going to suggest using the unix command-line sort utility via > popen() or subprocess. My arguments were that it's written in C, has 30 > years of optimizing in it, etc, etc, etc. It almost certainly has to be > faster than anything you could do in Python. > > Then I tried the experiment. I generated a file of 1 million random > integers in the range 0 to 5000. I wrote a little sorting program: [...] > Python took just about half the time. Certainly knocked my socks off. > Hard to believe, actually.
One million integers isn't very big. If each integer can fit in four-byte long, that's less than 4MB. That's almost small enough to fit in your CPU's cache, with room left over for the first few chapters of "War And Peace" *wink* So you're comparing Python's timsort, which is Awesome with a capital AWE but only works on data that fits in memory, versus something which can also work on files too big to fit into memory. Try generating a twenty gigabyte file of data, and sort that. Actually, don't, because just *reading it in* to Python will probably fail, and very possibly lock your PC up for the duration. Unix sort does an external R-Way merge sort: if you have more data than memory, it slices the data up into a bunch of smaller pieces, each of which will fit in memory, sorts each one to a temporary file, then merges the lot. It does a lot more work on such big files, because it *takes* a lot more work. For something as small as one million numbers, chances are the Unix sort falls back on a heapsort or a quicksort, which will be pretty fast, but it ain't no timsort. So yes, Python's timsort is awesome, but so is Unix's sort, just in different ways. -- Steven -- http://mail.python.org/mailman/listinfo/python-list