Peter Santoro added the comment:
It seems clear to me that the logic in shutil._unpack_zipfile that silently
skips paths that start with '/' (indicates absolute path) or that contain
references to the parent directory ('..') was added to prevent malicious zip
files from making potential malicious/unwanted modifications to the filesystem
(perhaps at a time when zipfile did not itself contain such logic). This
conservative approach works, but it can have unexpected results. For example,
if all entries in a zip file contain these invalid characters, then
shutil._unpack_zipfile appears to do nothing (i.e. the zip file is not
unpacked). This is good (except for the silent part), if the zip file is truly
malicious. However, I recently had to deal with thousands of zip files created
by well known software vendors where hundreds of the zip files were created
incorrectly and contained these invalid characters. These files were not
malicious, but they were created improperly. Note that shutil._unpack_zipfile
silently fai
led to unzip these files, but by using ZipFile.extractall I could unzip them.
It appears that most unzipping software today either either ignores (sometimes
silently) potentially malicious zip entries (e.g. Windows 7 Explorer displays
an invalid zip file error) or it attempts to filter out/replace known bad
characters so that the zip entries can be extracted (e.g. WinZip, gnu unzip).
I created this issue because the Python library uses both approaches, which may
need rethinking.
The newer logic in ZipFile._extract_member, which is used by
ZipFile.extractall, takes a different approach. Instead of silently ignoring
potentially malicious zip entries, it attempts to filter out or replace known
invalid characters before extracting the zip entries.
>From the Python zipfile docs:
---
If a member filename is an absolute path, a drive/UNC sharepoint and leading
(back)slashes will be stripped, e.g.: ///foo/bar becomes foo/bar on Unix, and
C:\foo\bar becomes foo\bar on Windows. And all ".." components in a member
filename will be removed, e.g.: ../../foo../../ba..r becomes foo../ba..r. On
Windows illegal characters (:, <, >, |, ", ?, and *) replaced by underscore (_).
---
As ZipFile._extract_member filters out/replaces more invalid characters than
shutil._unpack_zipfile handles, one could argue that the (apparent older)
approach used by shutil._unpack_zipfile is less safe.
The approaches used by shutil._unpack_zipfile and ZipFile.extractall to deal
with potentially malicious zip file entries are different. This issue could be
closed if not deemed important by the Python core developers or it could be
handled by documentation and/or coding changes.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue20907>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com