New submission from Ryan McGuire :
Opening a UTF-8 encoded file with unix newlines ("\n") on Win32:
codecs.open("whatever.txt","r","utf-8").read()
replaces the newlines ("\n") with CR+LF ("\r\n").
The docs specifically say that :
"Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-bit
values. This means that no automatic conversion of '\n' is done on
reading and writing."
And yet, opening the file with an explicit binary mode resolves the
situation:
codecs.open("whatever.txt","rb","utf-8").read()
This reads the file with the original newlines unmodified.
The implementation of codecs.open and the documentation are out of sync.
--
assignee: georg.brandl
components: Documentation, Library (Lib)
messages: 91995
nosy: EnigmaCurry, georg.brandl
severity: normal
status: open
title: codecs.open on Win32 does not force binary mode
type: behavior
versions: Python 2.6, Python 3.1
___
Python tracker
<http://bugs.python.org/issue6788>
___
___
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com