On Aug 1, 5:33 pm, aliman <aliman...@googlemail.com> wrote: > I understand that sorts are stable, so I could just repeat the whole > sort process once for each key in turn, but that would involve going > to and from disk once for each step in the sort, and I'm wondering if > there is a better way.
I would consider using memory mapping the file and sorting it inline. Sorting a binary file of bytes with NumPy is as easy as this: import numpy as np f = np.memmap(filename, mode='rwb', dtype=np.uint8) f.sort(kind='quicksort') del f (You can define dtype for any C data type or struct.) If the file is really big, use 64-bit Python. With memory mapping you don't have to worry about processing the file in chunks, because the operating systems will take care of those details. I am not sure how to achieve this (inline file sort) with standard library mmap and timsort, so I'll leave that out. Sturla -- http://mail.python.org/mailman/listinfo/python-list