[issue36172] csv module internal consistency

2020-08-28 Thread Josh Rosenberg


Change by Josh Rosenberg :


--
resolution:  -> not a bug
stage:  -> resolved
status: pending -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36172] csv module internal consistency

2019-03-05 Thread Josh Rosenberg


Change by Josh Rosenberg :


--
status: open -> pending

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36172] csv module internal consistency

2019-03-04 Thread Shane


Shane  added the comment:

Thank you both for having a look.  I just find that these sort of gotchas 
rather annoying (nonsensical mental burden of having to memorize behavior that 
does not behave like most other features for "hysterical raisins").

I think making the documentation more visible would be far better than nothing. 
 Currently, help(csv) does not even mention the newline parameter as an 
available option in any context, nor does help(csv.writer).

I think ideally, the user should be able to rely on a given module's help 
documentation for most things without having to leave the interpreter to 
consult the manual.  Thoughts?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36172] csv module internal consistency

2019-03-04 Thread Josh Rosenberg


Josh Rosenberg  added the comment:

Unless someone disagrees soon, I'm going to close this as documented 
behavior/not a bug. AFAICT, the only "fixes" available for this are:

1. Changing the default dialect from 'excel' to something else. Problem: Breaks 
correct code dependent on the excel dialect, but code could explicitly opt back 
in.

2. Change the 'excel' dialect. Problem: Breaks correct code dependent on the 
excel dialect, with no obvious way to opt back in.

3. Per #10954, check the file object to ensure it's not translating newlines 
and raise an exception otherwise. Problem: AFAICT, there is no documented API 
to check this (the result of calling open, with or without passing newline='', 
looks identical initially, never changes in write mode, and even in read mode, 
only exposes the newlines observed through the .newlines attribute, not whether 
or not they were translated), adding one wouldn't change all other file-like 
objects, so the change would need to propagate to all other built-in and 
third-party file APIs, and for some file-like objects, it wouldn't make sense 
to have this API at all (io.StringIO, being purely in memory, doesn't need to 
do translation of any kind)

4. (Extreme solution) Add io APIs (or add arguments to APIs) for 
reading/writing without newline translation (that is, whether or not newline is 
passed to open, you can read/write without translation), e.g. read(size) 
becomes read(size, translate_newlines=None) where None indicates default 
behavior, or we add read_untranslated(size) as an independent API. Problem: 
Like #3, this requires us to create new, mandatory APIs in the io module that 
would then need to propagate to all other built-in and third-party file APIs.

Point is, the simple solutions (1/2) break correct code, and the complex 
solutions (3/4) involve major changes to the io module (and all other file-like 
object producers) and/or the csv module.

Even then, nothing shy of #4 would make broken code just work, they just fail 
loudly. Both #3 and #4 would require cascading changes to every file-like 
object (both built-in and third-party) to make them work; for the file-like 
objects that aren't updated, we're stuck choosing between issuing a warning 
that most folks won't see, then ignoring the problem, or making those file-like 
objects without the necessary API cause true exceptions (making them unusable 
until the third party package is updated).

If a fix is needed, I think my suggestion would be to do one or both of:

1. Emphasize the newline='' warning in the 
csv.reader/writer/DictReader/DictWriter docs (right now it's just one more 
unemphasized line in a fairly long wall of text for each)

2. Put a large, top-of-module warning about this at the top of the csv module 
docs, so people reading the basic module description are exposed to the warning 
before they even reach the API.

Might help a few folks who are skimming without reading for detail.

--
nosy: +josh.r

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36172] csv module internal consistency

2019-03-03 Thread Martin Panter

Martin Panter  added the comment:

The documentation 
 says you should 
“open the files with newline=''.” IMO this is an unfortunate quirk of the CSV 
module. Everything else that I know of in the Python built-in library either 
works with binary files, which typically do no newline translation in Python 3, 
or is fine with newline translation enabled in text mode. See also Issue 10954 
about making the behaviour stricter.

--
nosy: +martin.panter

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36172] csv module internal consistency

2019-03-03 Thread Shane


New submission from Shane :

It occurred to me there is a slight mismatch in the behavioral consistency of 
the csv module (at least on Windows, Python 3.X).  Specifically, csv.writer() 
and csv.reader() treat the line terminator slightly differently.  To boil it 
down to a concise example:

#==
import csv

data = [[1, 2, 3], [4, 5, 6]]

with open('test.csv', 'w') as fout:
csv.writer(fout).writerows(data)

with open('test.csv', 'r') as fin:
data2 = list(csv.reader(fin))

print(data, data2, sep='\n')

>>> 
[[1, 2, 3], [4, 5, 6]]
[['1', '2', '3'], [], ['4', '5', '6'], []]
#==

So because csv.writer() uses lineterminator = '\r\n', data and data2 have a 
different structure (data2 has empty rows).  To me this seems undesirable, so I 
always go out of my way to use lineterminator = '\n'.  

#==
import csv

data = [[1, 2, 3], [4, 5, 6]]

with open('test.csv', 'w') as fout:
csv.writer(fout, lineterminator='\n').writerows(data)

with open('test.csv', 'r') as fin:
data2 = list(csv.reader(fin))

print(data, data2, sep='\n')

>>>
[[1, 2, 3], [4, 5, 6]]
[['1', '2', '3'], ['4', '5', '6']]
#==


Then the input and output have the same structure.  I assume there was a reason 
lineterminator = '\r\n' was chosen as default, but for me there is no benefit 
wrt csv files.  It seems like we would be better off with the more consistent, 
"reversible" behavior.

Alternatively, the default behavior of csv.reader() could be changed.  But in 
either case, I feel like their default behaviors should be in alignment.

Thoughts?  Thanks for reading.

--
messages: 337042
nosy: Shane Smith
priority: normal
severity: normal
status: open
title: csv module internal consistency
type: behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com