New submission from Keith Erskine:

If a csv file has a quote character at the beginning of a field but no closing 
quote, the csv module will keep reading the file until the very end in an 
attempt to close out the field.  It's true this situation occurs only when the 
quoting in a csv file is incorrect, but it would be extremely helpful if the 
csv reader could be told to stop reading each row of fields when it encounters 
a newline character, even if it is within a quoted field at the time.  At the 
moment, with large files, the csv reader will typically error out in this 
situation once it reads the maximum size of a string.  Furthermore, this is not 
an easy situation to trap with custom code.

Here's an example of the what I'm talking about.  For a csv file with the 
following content:
a,b,c
d,"e,f
g,h,i

This code:

    import csv
    with open('file.txt') as f:
        reader = csv.reader(f)
        for row in reader:
            print(row)

returns:
['a', 'b', 'c']
['d', 'e,f\ng,h,i\n']

Note that the whole of the file after "e", including delimiters and newlines, 
has been added to the second field on the second line. This is correct csv 
behavior but is very unhelpful to me in this situation.

On the grounds that most csv files do not have multiline values within them, 
perhaps a new dialect attribute called "multiline" could be added to the csv 
module, that defaults to True for backwards compatibility.  It would indicate 
whether the csv file has any field values within it that span more than one 
line.  If multiline is False, then the "parse_process_char" function in "_csv" 
would always close out a row of fields when it encounters a newline character.  
It might be best if this multiline attribute were taken into account only when 
"strict" is False.

Right now, I do get badly-formatted files like this, and I cannot ask the 
source for a new file.  I have to manually correct the file using a mixture of 
custom scripts and vi before the csv module will read it. It would be very 
helpful if csv would handle this directly.

----------
messages: 291453
nosy: keef604
priority: normal
severity: normal
status: open
title: csv reader chokes on bad quoting in large files
type: enhancement
versions: Python 3.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30034>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to