STINNER Victor added the comment: > The BOM (byte order mark) appears in the standard input stream. When using > cmd.exe, the BOM is not present. This behavior occurs in CP1252 as well as > CP65001.
How you do change the console encoding? Using the chcp command? I'm surprised that you get a UTF-8 BOM when the code page 1252 is used. Can you please check that sys.stdin.encoding is "cp1252"? I tested PowerShell with Python 3.5 on Windows 7 with an OEM code page 850 and ANSI code page 1252: - by default, the stdin encoding is cp850 (OEM code page) and os.device_encoding(0) returns "cp850". sys.stdin.readline() does not contain a BOM. - when stdin is a pipe (ex: echo "abc"|python ...), the stdin encoding becomes cp1252 (ANSI code page) because os.device_encoding(0) returns None; cp1252 is the result of locale.getpreferredencoding(False) (ANSI code page). sys.stdin.readline() does not contain a BOM. If I change the console encoding using the command "chcp 65001": - by default, the stdin encoding = os.device_encoding(0) = "cp65001". sys.stdin.readline() does not contain a BOM. - when stdin is a pipe, stdin encoding = locale.getpreferredencoding(False) = "cp1252" and sys.stdin.readline() *contains* the UTF-8 BOM Note: The UTF-8 BOM is only written once, before the first character. So the UTF-8 BOM is only written in one case under these conditions: - Python is running in PowerShell (The UTF-8 BOM is not written in cmd.exe, even with chcp 65001) - sys.stdin is a pipe - the console encoding was set manually to cp65001 -- It looks like PowerShell decodes the output of the producer program (echo, type, ...) and then encodes the output to the consumer program (ex: python). It's possible to change the encoding of the encoder by setting $OutputEncoding variable. Example to encode to UTF-8 without the BOM: $OutputEncoding = New-Object System.Text.UTF8Encoding($False) Example to encode to UTF-8 without the BOM: $OutputEncoding = [System.Text.Encoding]::UTF8 Using [System.Text.Encoding]::UTF8, sys.stdin.readline() starts with a BOM even if the console encoding is cp850. If you set the console encoding to 65001 (chcp 65001) and $OutputEncoding to [System.Text.Encoding]::UTF8, you get... two UTF-8 BOMs... yeah! I tried different producer programs: [MS-DOS] echo "abc", [PowerShell] write-output "abc", [MS-DOS] type document.txt, [PowerShell] Get-Content document.txt, python -c "print('abc')". It doesn't like like using a different program changes anything. The UTF-8 BOM is added somewhere by PowerShell between by producer and the consumer programs. To show the console input and output encodings in PowerShell, type "[console]::InputEncoding" and "[console]::OutputEncoding". See also: http://stackoverflow.com/questions/22349139/utf8-output-from-powershell ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue21927> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com