On Mon, Aug 1, 2011 at 8:22 PM, Tony Zhang <warriorla...@gmail.com> wrote: > Thanks! > > Actually, I used .readline() to parse file line by line, because I need > to find out the start position to extract data into list, and the end > point to pause extracting, then repeat until the end of file. > My file to read is formatted like this: > > blabla...useless.... > useless... > > /sign/ > data block(e.g. 10 cols x 1000 rows) > ... > blank line > /sign/ > data block(e.g. 10 cols x 1000 rows) > ... > blank line > ... > ... > EOF > let's call this file 'myfile' > and my python snippet: > > f=open('myfile','r') > blocknum=0 #number the data block > data=[] > while True" > # find the extract begnning > while not f.readline().startswith('/a1/'):pass > # creat multidimensional list to store data block > data=append([]) > blocknum +=1 > line=f.readline() > > while line.strip(): > # check if the line is a blank line, i.e the end of one block > data[blocknum-1].append(["2.6E" %float(x) for x in > line.split()]) > line = f.readline() > print "Read Block %d" %blocknum > if not f.readline(): break > > The running result was that read a 500M file consume almost 2GB RAM, I > cannot figure it out, somebody help!
If you could store the floats themselves, rather than their string representations, that would be more space-efficient. You could then also use the `array` module, which is more space-efficient than lists (http://docs.python.org/library/array.html ). Numpy would also be worth investigating since multidimensional arrays are involved. The next obvious question would then be: do you /really/ need /all/ of the data in memory at once? Also, just so you're aware: http://docs.python.org/library/sys.html#sys.getsizeof Cheers, Chris -- http://rebertia.com -- http://mail.python.org/mailman/listinfo/python-list