oyekomova wrote: > Thanks for your note. I have 1Gig of RAM. Also, Matlab has no problem > in reading the file into memory. I am just running Istvan's code that > was posted earlier.
You have a CSV file of about 520 MiB, which is read into memory. Then you have a list of list of floats, created by list comprehension, which is larger than 274 MiB. Additionally you try to allocate a NumPy array slightly larger than 274 MiB. Now your process is already exceeding 1 GiB, and you are probably running other processes too. That is why you run out of memory. So you have three options: 1. Buy more RAM. 2. Low-level code a csv-reader in C. 3. Read the data in chunks. That would mean something like this: import time, csv, random import numpy def make_data(rows=6E6, cols=6): fp = open('data.txt', 'wt') counter = range(cols) for row in xrange( int(rows) ): vals = map(str, [ random.random() for x in counter ] ) fp.write( '%s\n' % ','.join( vals ) ) fp.close() def read_test(): start = time.clock() arrlist = None r = 0 CHUNK_SIZE_HINT = 4096 * 4 # seems to be good fid = file('data.txt') while 1: chunk = fid.readlines(CHUNK_SIZE_HINT) if not chunk: break reader = csv.reader(chunk) data = [ map(float, row) for row in reader ] arrlist = [ numpy.array(data,dtype=float), arrlist ] r += arrlist[0].shape[0] del data del reader del chunk print 'Created list of chunks, elapsed time so far: ', time.clock() - start print 'Joining list...' data = numpy.empty((r,arrlist[0].shape[1]),dtype=float) r1 = r while arrlist: r0 = r1 - arrlist[0].shape[0] data[r0:r1,:] = arrlist[0] r1 = r0 del arrlist[0] arrlist = arrlist[0] print 'Elapsed time:', time.clock() - start make_data() read_test() This can process a CSV file of 6 million rows in about 150 seconds on my laptop. A CSV file of 1 million rows takes about 25 seconds. Just reading the 6 million row CSV file ( using fid.readlines() ) takes about 40 seconds on my laptop. Python lists are not particularly efficient. You can probably reduce the time to ~60 seconds by writing a new CSV reader for NumPy arrays in a C extension. -- http://mail.python.org/mailman/listinfo/python-list