aliman wrote: > Apologies I'm sure this has been asked many times, but I'm trying to > figure out the most efficient way to do a complex sort on very large > files. > > I've read the recipe at [1] and understand that the way to sort a > large file is to break it into chunks, sort each chunk and write > sorted chunks to disk, then use heapq.merge to combine the chunks as > you read them. > > What I'm having trouble figuring out is what to do when I want to sort > by one key ascending then another key descending (a "complex sort"). > > I understand that sorts are stable, so I could just repeat the whole > sort process once for each key in turn, but that would involve going > to and from disk once for each step in the sort, and I'm wondering if > there is a better way. > > I also thought you could apply the complex sort to each chunk before > writing it to disk, so each chunk was completely sorted, but then the > heapq.merge wouldn't work properly, because afaik you can only give it > one key.
You can make that key as complex as needed: >>> class Key(object): ... def __init__(self, obj): ... self.asc = obj[1] ... self.desc = obj[2] ... def __cmp__(self, other): ... return cmp(self.asc, other.asc) or -cmp(self.desc, other.desc) ... >>> sorted(["abc", "aba", "bbb", "aaa", "aab"], key=Key) ['aab', 'aaa', 'abc', 'bbb', 'aba'] See also http://docs.python.org/library/functools.html#functools.total_ordering -- http://mail.python.org/mailman/listinfo/python-list