New submission from Neil Hodgson <nyamaton...@users.sourceforge.net>:
Unicode includes Line Separator U+2028 and Paragraph Separator U+2029 line ending characters. The readlines method of the file object returned by the built-in open does not treat these characters as line ends although the object returned by codecs.open(..., encoding='utf-8') does. The attached program creates a UTF-8 file containing three lines with the second line ended with a Paragraph Separator. The program then reads this file back in as a text file. Only two lines are seen when reading the file back in. The desired behaviour is for the file to be read in as three lines. ---------- components: IO files: lineends.py messages: 91397 nosy: nyamatongwe severity: normal status: open title: readlines should understand Line Separator and Paragraph Separator characters versions: Python 3.1 Added file: http://bugs.python.org/file14671/lineends.py _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue6664> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com