On 8/6/2011 10:53 AM, sturlamolden wrote:
On Aug 1, 5:33 pm, aliman<aliman...@googlemail.com>  wrote:

I've read the recipe at [1] and understand that the way to sort a
large file is to break it into chunks, sort each chunk and write
sorted chunks to disk, then use heapq.merge to combine the chunks as
you read them.

Or just memory map the file (mmap.mmap) and do an inline .sort() on
the bytearray (Python 3.2). With Python 2.7, use e.g. numpy.memmap
instead. If the file is large, use 64-bit Python. You don't have to
process the file in chunks as the operating system will take care of
those details.

Sturla

   No, no, no.  If the file is too big to fit in memory, trying to
page it will just cause thrashing as the file pages in and out from
disk.

   The UNIX sort program is probably good enough.  There are better
approaches, if you have many gigabytes to sort, (see Syncsort, which
is a commercial product) but few people need them.

                                John Nagle

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to