Matthew Boehm <boehm.matt...@gmail.com> added the comment:

I've attached a patch for python2.7 that adds a small not to 
library/stdtypes.html#str.splitlines explaining which sequences are treated as 
line breaks:

"""
Note: Python recognizes "\r", "\n", and "\r\n" as line boundaries for strings.

In addition to these, Unicode strings can have line boundaries of u"\x0b", 
u"\x0c", u"\x85", u"\u2028", and u"\u2029"
"""

Additional thoughts:

* Would it be better to put this note in a different place?

* It looks like \x0b and \x0c (vertical tab and form feed) were first 
considered line breaks in Python 2.7, probably related to this note from 
"What's New in 2.7": "The Unicode database provided by the unicodedata module 
is now used internally to determine which characters are numeric, whitespace, 
or represent line breaks." It might be worth putting a "changed in 2.7" note 
somewhere in the docs.

Please let me know of any thoughts you have and I'll be glad to make any 
desired changes and submit a new patch.

----------
keywords: +patch
title: open() and codecs.open() treat form-feed differently -> linebreak 
sequences should be better documented
Added file: http://bugs.python.org/file23069/linebreakdoc.py27.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12855>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to