On Jan 25, 9:23 am, Asim <[EMAIL PROTECTED]> wrote: > On Jan 24, 4:26 pm, [EMAIL PROTECTED] wrote: > > > > > Thanks to all who replied. It's very appreciated. > > > Yes, I had to doublecheck line counts and the number of lines is ~16 > > million (insetead of stated 1.6B). > > > Also: > > > >What is a "Unicode text file"? How is it encoded: utf8, utf16, utf16le, > > >utf16be, ??? If you don't know, do this: > > > The file is UTF-8 > > > > Do the first two characters always belong to the ASCII subset? > > > Yes, first two always belong to ASCII subset > > > > What are you going to do with it after it's sorted? > > > I need to isolate all lines that start with two characters (zz to be > > particular) > > > > Here's a start:http://docs.python.org/lib/typesseq-mutable.html > > > Google "GnuWin32" and see if their sort does what you want. > > > Will do, thanks for the tip. > > > > If you really have a 2GB file and only 2GB of RAM, I suggest that you > > > don't hold your breath. > > > I am limited with resources. Unfortunately. > > Since the OP has stated that they are running Windows XP, and more > than one poster has suggested installing more RAM in the box, I > thought people should know that WinXP has certain limitations on the > amount of memory that may be used: > > http://msdn2.microsoft.com/en-us/library/aa366778.aspx > > Firstly, the maximum amount of physical memory that may be installed > is 4GB. Secondly, with the "4 gigabyte tuning" and > "IMAGE_FILE_LARGE_ADDRESS_AWARE" patches, the maximum amount of > virtual memory (phyical memory + swapfile size) that may be assigned > to user processes is 2GB. > > Hence, even if you made a 100GB swap file with 4GB RAM installed, by > default only a maximum of 2GB would ever be assigned to a user- > process. With the two flags enabled, the maximum becomes 3GB. > > If the OP finds performance to be limited and thinks more RAM would > help trying a later version of Windows would be a start, but better > would be to try Linux or Mac OSX out. > > Cheers, > Asim > > > Cheers, > > > Ira
Sorry, just to clarify my response. Any 32-bit OS will only be able to assign 4GB of virtual memory to a single processes, the argument being that since processes can only issue 32-bit instructions the process can only address a maximum of 2^32 bytes of addresses (assuming the architecture is using byte-addressed memory). Another link that's easier to grok: http://www.codinghorror.com/blog/archives/000811.html However, a 32-bit OS may support more than 4GB of virtual memory (using "Physical Address Extension", or PAE) and split it more intelligently between processes than Windows XP or Vista does: http://www.ibm.com/developerworks/linux/library/l-memmod/ So allocating more than 4GB of virtual memory to your sort application could be achieved through splitting your task into more than one process on an appropriate OS. AFAIK, such memory limitations are dependent on the particular Linux distro you're using, and I'm not sure about Mac OSX limitations. This applies doubly for 64-bit architectures and OS's. Please correct me, with references, if my conclusions are wrong. Cheers, Asim -- http://mail.python.org/mailman/listinfo/python-list