Hi, I've come across a bug in CSV where the csv.reader() raises an exception if the input line contains '\r'. Example code and output below shows a test case where csv.reader() cannot read an array written by csv.writer().
I believe this is a known bug and may have been fixed for Python 2.5. However I'm after suggestions for workarounds for Python 2.4.2. This is part of a project where I'm storing large tables from mainframe systems as CSVs for subsequent data cleansing and post-processing. Some tables have 300 columns and tens of millions of rows. The mainframe data fields are poorly documented, so I don't know at the time of writing the CSV whether a '\r' is part of a binary field and so must be retained, or is a random byte in an uninitialised field and so can safely be deleted. Therefore I'd prefer to make minimum changes that might screw up the data. Any suggestions for how to proceed are most welcome! Thanks in advance, Stephen Simmons #====================================================== # Bug in Python 2.4.2's csv module # Stephen Simmons, mail at stevesimmons.com, 24 Jan 2006 import csv s = [ ['a'], ['\r'], ['b'] ] name = 'c://temp//test2.csv' print 'Writing CSV file containing %s' % repr(s) f = file(name, 'wb') csv.writer(f).writerows(s) f.close() print 'CSV file is %s' % repr(file(name, 'rb').read()) print 'Now reading back as CSV...' for r in csv.reader(file(name, 'rb')): print 'Read row containing %s' % repr(r) # Output is """In [29]: run csv_error.py Writing CSV file containing [['a'], ['\r'], ['b']] Contents of the CSV file are 'a\r\n"\r"\r\nb\r\n' Now reading back as CSV... Read row containing ['a'] --------------------------------------------------------------------------- _csv.Error Traceback (most recent call last) c:\temp\csv_error.py 14 print 'CSV file is %s' % repr(file(name, 'rb').read()) 15 16 print 'Now reading back as CSV...' ---> 17 for r in csv.reader(file(name, 'rb')): 18 print 'Read row containing %s' % repr(r) Error: newline inside string WARNING: Failure executing file: <csv_error.py> """ -- http://mail.python.org/mailman/listinfo/python-list