New submission from STINNER Victor <victor.stin...@haypocalc.com>:

tarfile is unable to open a TAR archive in PAX format embedding invalid 
filenames (filename not encoded in utf8, an undecodable filename). Attached 
file is an example (contain the file b'z/\xff', not decodable from utf8).

PAX specification has a "invalid" option with 4 values: bypass (default), 
rename, UTF-8, write.
http://www.opengroup.org/onlinepubs/009695399/utilities/pax.html

As it was done for other formats in issue #8390, PAX can use Python 
surrogateescape error handler to store undecodable bytes as unicode surrogates.

I think that PAX should be strict by default, but have an option to enable 
surrogateescape mode.

----------
components: Library (Lib)
files: z-pax.tar
messages: 105094
nosy: haypo, lars.gustaebel, loewis
priority: normal
severity: normal
status: open
title: tarfile doesn't support undecodable filename in PAX format
versions: Python 3.2
Added file: http://bugs.python.org/file17230/z-pax.tar

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8633>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to