Vincent Davis wrote:
I have some some (~50) text files that have about 250,000 rows each. I am reading them in using the following which gets me what I want. But it is not fast. Is there something I am missing that should help. This is mostly an question to help me learn more about python. It takes about 4 min right now.

def read_data_file(filename):
    reader = csv.reader(open(filename, "U"),delimiter='\t')
    read = list(reader)
data_rows = takewhile(lambda trow: '[MASKS]' not in trow, [x for x in read])

'takewhile' accepts an iterable, so "[x for x in read]" can be
simplified to "read".

    data = [x for x in data_rows][1:]
    data = data_rows[1:]

mask_rows = takewhile(lambda trow: '[OUTLIERS]' not in trow, list(dropwhile(lambda drow: '[MASKS]' not in drow, read)))
    mask = [row for row in mask_rows if row][3:]
No need to convert the result of 'dropwhile' to list.

    outlier_rows = dropwhile(lambda drows: '[OUTLIERS]' not in drows, read)
    outlier = [row for row in outlier_rows if row][3:]

The problem, as I see it, is that you're scanning the rows more than
once.

Is this any better?

def read_data_file(filename):
    reader = csv.reader(open(filename, "U"),delimiter='\t')
    data = []
    for row in reader:
        if '[MASKS]' in row:
            break
        data.append(row)
    data = data[1:]
    mask = []
    if '[MASKS]' in row:
        mask.append(row)
        for row in reader:
            if '[OUTLIERS]' in row:
                break
            if row:
                mask.append(row)
        mask = mask[3:]
    outlier = []
    if '[OUTLIERS]' in row:
        outlier.append(row)
        outliter.extend(row for row in outlier if row)
        outlier = outlier[3:]

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to