Dennis Lee Bieber wrote: > On Fri, 25 Aug 2006 16:39:14 +0200, Claudio Grondi > <[EMAIL PROTECTED]> declaimed the following in comp.lang.python: > > >>The core of my problem was ... trying to use 'wb' or 'w+b' ... (stupid >>me ...) > > > Ouch... How many times did you have to restore that massive file > from backup? > I was smart enough to try it first on a very small file wondering what was happening. Python documentation and even Google search after 'random file access in Python' were not helpful as there was no example and no hint available.
The only hint about random file access in Python I found with Google was Table of Contents of "Python Cookbook" from O'Railly: http://www.oreilly.com/catalog/pythoncook2/toc.html and hints about random reading access. I was stupid enough to forget about 'r+' (used it many times before in C/C++ a decade ago, but not yet in Python) thinking just too much the Pythonic way: =============================================================== if I want to write, I don't open for reading (plus or not plus) =============================================================== Actually my file was 'only' 42 GByte, but I wanted to construct the question making it impossible to suggest use of an intermediate file. In between I have chosen a total new approach as random writing to hard disk seems to actually move the disk head each time when seeking, so apparently no cache is used sorting a bit the pieces to write to the disk, so if there are many of them there is no other chance as to try to put them together in memory first before writing them to the file. This makes the straightforward intuitive programming a bit complicated because to work on large files it is necessary to work in chunks and waste some processing results when they don't fill the gaps. I suppose I am still not on the right path, so by the way: Is there a ready to use (free, best Open Source) tool able to sort lines (each line appr. 20 bytes long) of a XXX GByte large text file (i.e. in place) taking full advantage of available memory to speed up the process as much as possible? Claudio Grondi -- http://mail.python.org/mailman/listinfo/python-list