Terry J. Reedy added the comment:

I was not aware of the remainder of the undocumented behavior. Thanks for the 
code that makes it clear .

linebreak (or linebreaks)=True means that splitting occurs on some 
(approximation?*) of unicode mandatory linebreaks, as opposed to just the ascii 
'universal newline' sequences, as defined in our glossary. Possible 
alternative: restrict=False (restrict to u. newlines?)

*I did not read the annex in enough detail to know either way.

The following pair of experiments, which I should have run before, show that 
there has been no real change of behavior from 2.x to 3.x.
# 2.7.8
>>> u'a\x0ab\x0bc\x0cd\x0dda\x0d\x0a1c\x1c1d\x1d1e\x1e85\x852028\u20282029\u2029end'.splitlines()
[u'a', u'b', u'c', u'd', u'da', u'1c', u'1d', u'1e', u'85', u'2028', u'2029', 
u'end']
# 3.4.1
b'a\x0ab\x0bc\x0cd\x0dda\x0d\x0a1c\x1c1d\x1d1e\x1e85\x85end'.splitlines()
[b'a', b'b\x0bc\x0cd', b'da', b'1c\x1c1d\x1d1e\x1e85\x85end']

Given this, I am a bit dubious about adding a new parameter in 3.5 to make the 
unicode method act like the bytes method. Part of my support for that was 
thinking that it would help porting code. But that is not true. In both 2 and 
3, there is the possibility to latin-1 encode, split, and latin-1 decode the 
pieces.

The doc correction clearly needed is that the 3.4+ universal newlines glossary 
entry needs to be updated from 'str.splitlines' to 'bytes.newlines'. I will try 
to do this.

A second doc problem is that the docstrings (given by help(x.splitlines) are 
exactly the same for bytes.splitlines and unicode.splitlines, in both 2.x and 
3.x, even though the behavior is different even for ascii. I think each should 
list what they split on.  Ditto for the doc is not already.  This should have a 
patch posted for review.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue22232>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to