[issue30034] csv reader chokes on bad quoting in large files

Keith Erskine Mon, 10 Apr 2017 18:12:55 -0700

Keith Erskine added the comment:

As you say, David, however much we would like the world to stick to a given CSV 
standard, the reality is that people don't, which is all the more reason for 
making the csv reader flexible and forgiving.


The csv module can and should be used for more than just 
"comma-separated-values" files.  I use it for all sorts of different delimited 
files, and it works very well.  Pandas uses it, as I'm sure do many other 
packages.  It's such a good module, it would be a pity to restrict its scope to 
just Excel-related scenarios.  Parsing delimited files is undoubtedly complex, 
and painfully slow if done with pure Python, so the more that can be done in C 
the better.

I'm no C programmer, but my guesstimate is that the coding changes I'm 
proposing are relatively modest.  In the IN_QUOTED_FIELD section 
(https://github.com/python/cpython/blob/master/Modules/_csv.c#L690), it would 
mean checking for newline characters if the new "multiline" attribute is False 
(and probably "strict" is False too).  Of course there is more to this change 
than just that, but I'm guessing not that much more.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30034>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30034] csv reader chokes on bad quoting in large files

Reply via email to