On 2010-10-13, pstatham <pstat...@sefas.com> wrote: > Hopefully this will interest some, I have a csv file (can be > downloaded from http://www.paulstathamphotography.co.uk/45.txt) which > has five fields separated by ~ delimiters. To read this I've been > using a csv.DictReader which works in 99% of the cases. Occasionally > however the description field has errant \r\n characters in the middle > of the record. This causes the reader to assume it's a new record and > try to read it.
Here's an alternative idea. Working with csv module for this job is too difficult for me. ;) import re record_re = "(?P<PROGTITLE>.*?)~(?P<SUBTITLE>.*?)~(?P<EPISODE>.*?)~(?P<DESCRIPTION>.*?)~(?P<DATE>.*?)\n(.*)" def parse_file(fname): with open(fname) as f: data = f.read() m = re.match(record_re, data, flags=re.M | re.S) while m: yield m.groupdict() m = re.match(record_re, m.group(6), flags=re.M | re.S) for record in parse_file('45.txt'): print(record) -- Neil Cerutti -- http://mail.python.org/mailman/listinfo/python-list