New submission from STINNER Victor <victor.stin...@haypocalc.com>:

mbcs encoding replace non encodable characters (loose information) and doesn't 
support surrogateescape error handler. It ignores the error handler argument: 
see #850997, and tarfile now uses surrogateescape error handler by default 
(#8390). This encoding is just horrible for unicode support :-)

Since Windows native API use unicode character (UTF-16), I think that it would 
be better to use utf-8 for the default encoding on Windows. utf-8 is able to 
encode and decode the full Unicode charset and supports all error handlers 
(especially surrogateescape).

Attached patch sets the default encoding to utf-8 on Windows, and removes the 
test ENCODING is None because sys.getfilesystemencoding() cannot be None 
anymore (in 3.2 only, it's a recent change: #8610).

----------
components: Library (Lib), Unicode, Windows
files: tarfile_windows_utf8.patch
keywords: patch
messages: 106276
nosy: haypo, lars.gustaebel
priority: normal
severity: normal
status: open
title: tarfile/Windows: Don't use mbcs as the default encoding
versions: Python 3.2
Added file: http://bugs.python.org/file17435/tarfile_windows_utf8.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8784>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to