Stephen J. Turnbull added the comment:

Suggested NEWS/whatsnew entry:

Add a new *memberNameEncoding* argument to the ZipFile constructor, allowing
:mod:`zipfile` to read filenames in non-conforming encodings from the
zipfile as Unicode.  This implementation assumes all member names have the same 
encoding.

Motivation:

There are applications in Japan that create zipfiles with directories 
containing filenames encoded in Shift JIS.  There may be such software in other 
countries as well.  As this is a violation of the Zip format definition, this 
library implements only an option to read such files.

Done:

(1) Add a memberNameEncoding argument to the main() function, which may be set 
from the command line with "--membernameencoding={codec}".  This command line 
option may be used with -e or -l, but not -c or -t.  There is no point to it in 
the latter, since the member names are not printed.
(2) Add a memberNameEncoding argument to the ZipFile constructor.  This is the 
only way to set it, so this is global to the ZipFile.
(3) Add this attribute to repr.
(4) Add a check that the mode is `read` in main() and in the ZipFile 
constructor, and if not invoke USAGE and exit or raise RuntimeError.
(5) When retrieving member names in constructing ZipInfo instances, check if 
memberNameEncoding is set, and if so use it, unless the UTF-8 bit is set. In 
that case, obey the UTF-8 bit, as the specified encoding is surely user error.
(6) Add a CODEC_USAGE message.
(7) Update the docs (docstrings, library reference, NEWS).
(8) Add tests:
    (a) List a zipfile's SJIS-encoded directory.
    (b) List a UTF-8-encoded directory and an ISO-8859-1-encoded directory as 
Shift-JIS.
    (c) Check that USAGE is invoked on attempts to write a zipfile in main().
    (d) Check that an appropriate error is raised on attempts to write in other 
functions.
    Many other tests are run as well.
    ALL TESTS PASS.
(9) Docs build without error.

To do (?):

(10) NEWS/whatsnew
(11) Check relevant code paths are all covered by tests.
(12) Review docs for clarity and organization.

Not done:

I don't think these are appropriate/needed at this time, but listed in case 
somebody thinks otherwise.

(13) Add a subtype of RuntimeError (see 7d)?
(14) Issue warning if both membernameencoding and utf-8 bit are set (see 4)?
(15) Support InfoZip encoding extension mentioned in APPNOTE.TXT - .ZIP File 
Format Specification, v6.3.4.
(16) Support per-member encodings (I think the zipfile standard permits, but 
not sure).

----------
keywords: +needs review
status: pending -> open
Added file: http://bugs.python.org/file44564/encoded-member-names

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28080>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to