Vincent Davis wrote:
I have some some (~50) text files that have about 250,000 rows each. I
am reading them in using the following which gets me what I want. But it
is not fast. Is there something I am missing that should help. This is
mostly an question to help me learn more about python. It takes about 4
min right now.
def read_data_file(filename):
reader = csv.reader(open(filename, "U"),delimiter='\t')
read = list(reader)
data_rows = takewhile(lambda trow: '[MASKS]' not in trow, [x for x
in read])
'takewhile' accepts an iterable, so "[x for x in read]" can be
simplified to "read".
data = [x for x in data_rows][1:]
data = data_rows[1:]
mask_rows = takewhile(lambda trow: '[OUTLIERS]' not in trow,
list(dropwhile(lambda drow: '[MASKS]' not in drow, read)))
mask = [row for row in mask_rows if row][3:]
No need to convert the result of 'dropwhile' to list.
outlier_rows = dropwhile(lambda drows: '[OUTLIERS]' not in drows, read)
outlier = [row for row in outlier_rows if row][3:]
The problem, as I see it, is that you're scanning the rows more than
once.
Is this any better?
def read_data_file(filename):
reader = csv.reader(open(filename, "U"),delimiter='\t')
data = []
for row in reader:
if '[MASKS]' in row:
break
data.append(row)
data = data[1:]
mask = []
if '[MASKS]' in row:
mask.append(row)
for row in reader:
if '[OUTLIERS]' in row:
break
if row:
mask.append(row)
mask = mask[3:]
outlier = []
if '[OUTLIERS]' in row:
outlier.append(row)
outliter.extend(row for row in outlier if row)
outlier = outlier[3:]
--
http://mail.python.org/mailman/listinfo/python-list