Re: Using csv.DictReader with \r\n in the middle of fields

Neil Cerutti Wed, 13 Oct 2010 08:08:04 -0700

On 2010-10-13, pstatham <[email protected]> wrote:
> Hopefully this will interest some, I have a csv file (can be
> downloaded from http://www.paulstathamphotography.co.uk/45.txt) which
> has five fields separated by ~ delimiters. To read this I've been
> using a csv.DictReader which works in 99% of the cases. Occasionally
> however the description field has errant \r\n characters in the middle
> of the record. This causes the reader to assume it's a new record and
> try to read it.


Here's an alternative idea. Working with csv module for this job
is too difficult for me. ;)

import re

record_re = 
"(?P<PROGTITLE>.*?)~(?P<SUBTITLE>.*?)~(?P<EPISODE>.*?)~(?P<DESCRIPTION>.*?)~(?P<DATE>.*?)\n(.*)"

def parse_file(fname):
    with open(fname) as f:
        data = f.read()
        m = re.match(record_re, data, flags=re.M | re.S)
        while m:
            yield m.groupdict()
            m = re.match(record_re, m.group(6), flags=re.M | re.S)

for record in parse_file('45.txt'):
    print(record)

-- 
Neil Cerutti
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Using csv.DictReader with \r\n in the middle of fields

Reply via email to