[issue12855] open() and codecs.open() treat form-feed differently

2011-08-29 Thread Matthew Boehm

New submission from Matthew Boehm boehm.matt...@gmail.com:

A file opened with codecs.open() splits on a form feed character (\x0c) while a 
file opened with open() does not.

 with open(formfeed.txt, w) as f:
...   f.write(line \fone\nline two\n)
...
 with open(formfeed.txt, r) as f:
...   s = f.read()
...
 s
'line \x0cone\nline two\n'
 print s
line
one
line two

 import codecs
 with open(formfeed.txt, rb) as f:
...   lines = f.readlines()
...
 lines
['line \x0cone\n', 'line two\n']
 with codecs.open(formfeed.txt, r, encoding=ascii) as f:
...   lines2 = f.readlines()
...
 lines2
[u'line \x0c', u'one\n', u'line two\n']


Note that lines contains two items while lines2 has 3.

Issue 7643 has a good discussion on newlines in python, but I did not see this 
discrepancy mentioned.

--
components: Interpreter Core
messages: 143182
nosy: Matthew.Boehm
priority: normal
severity: normal
status: open
title: open() and codecs.open() treat form-feed differently
type: behavior
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12855
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12855] open() and codecs.open() treat form-feed differently

2011-08-29 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

U+000C (Form feed) is considered as a line boundary in Unicode (unicode type), 
but no for a byte string (str type).

Example:

 u'line \x0cone\nline two\n'.splitlines(True)
[u'line \x0c', u'one\n', u'line two\n']
 'line \x0cone\nline two\n'.splitlines(True)
['line \x0cone\n', 'line two\n']

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12855
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12855] open() and codecs.open() treat form-feed differently

2011-08-29 Thread Matthew Boehm

Matthew Boehm boehm.matt...@gmail.com added the comment:

Thanks for explaining the reasoning.

Perhaps I should add this to the python wiki 
(http://wiki.python.org/moin/Unicode) ?

It would be nice if it fit in the docs somewhere, but I'm not sure where.

I'm curious how (or if) 2to3 would handle this as well, but I'm closing this 
issue as it's now clear to me why these two are expected to act differently.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12855
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12855] open() and codecs.open() treat form-feed differently

2011-08-29 Thread Matthew Boehm

Changes by Matthew Boehm boehm.matt...@gmail.com:


--
resolution:  - wont fix
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12855
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12855] open() and codecs.open() treat form-feed differently

2011-08-29 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 It would be nice if it fit in the docs somewhere,
 but I'm not sure where.

See:
http://docs.python.org/library/codecs.html#codecs.StreamReader.readline

Can you suggest a patch for the documentation? Source code of this document:
http://hg.python.org/cpython/file/bb7b14dd5ded/Doc/library/codecs.rst

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12855
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12855] open() and codecs.open() treat form-feed differently

2011-08-29 Thread Matthew Boehm

Matthew Boehm boehm.matt...@gmail.com added the comment:

I'll suggest a patch for the documentation when I get to my home computer in an 
hour or two.

--
assignee:  - docs@python
components: +Documentation -Interpreter Core
nosy: +docs@python
resolution: wont fix - 
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12855
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12855] open() and codecs.open() treat form-feed differently

2011-08-29 Thread Matthew Boehm

Matthew Boehm boehm.matt...@gmail.com added the comment:

I'm taking a look at the docs now.

I'm considering adding a table/list of characters python treats as newlines, 
but it seems like this might fit better as a note in 
http://docs.python.org/library/stdtypes.html#str.splitlines or somewhere else 
in stdtypes. I'll start working on it now, but please let me know what you 
think about this.

This is my first attempt at a patch, so I greatly appreciate your help so far.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12855
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com