On Jan 24, 9:26 pm, [EMAIL PROTECTED] wrote: > > If you really have a 2GB file and only 2GB of RAM, I suggest that you don't > > hold your breath. > > I am limited with resources. Unfortunately.
As long as you have at least as much disc space spare as you need to hold a copy of the file then this is not too hard. Split the file into chunks that are small enough to fit in memory, sort each chunk and write it to a file and then interleave the chunks. Below is a cheap and cheesy outline of code to do this, from which you can start. For files which are hugely larger than your available memory you can do this recursively but for files which are 10 to 100 times too big the single-pass code below will probably work just fine. The complexity is technically O(n.(log(c)+(n/c))) where n is the size of input and c is the chunk size; once n/c (the number of chunks) exceeds log(c) the cost of merging the chunks will start to dominate, though a recursive version would be slowed by needing a lot more disc access. #!/usr/bin/env python from itertools import islice from tempfile import TemporaryFile import sys # Tweak this number to fill your memory lines_per_chunk = 100000 chunkfiles = [] mergechunks = [] while True: chunk = list(islice(sys.stdin, lines_per_chunk)) if not chunk: break chunk.sort() f = TemporaryFile() f.writelines(chunk) f.seek(0) mergechunks.append((chunk[0], len(chunkfiles))) chunkfiles.append(f) while mergechunks: # The next line is order O(n) in the number of chunks (line, fileindex) = min(mergechunks) mergechunks.remove((line, fileindex)) sys.stdout.write(line) nextline = chunkfiles[fileindex].readline() if nextline == "": chunkfiles[fileindex].close() else: mergechunks.append((nextline, fileindex)) -- http://mail.python.org/mailman/listinfo/python-list