Ivan Sorokin <ivan.sorokin.t...@gmail.com> added the comment:

Grand unified algorithm to read filenames from zip files correctly:

1. Do zip entry have «Unicode Path Extra Field» (0x7075)? Use it for file name.
2. Is Unicode flag (0x800) set in «Flags» Field of zip entry? Assume «Filename» 
Field is in UTF-8.
3. Do «HostOS» Field of zip entry have values of 0 (FAT) or 11 (NTFS)? Assume 
«Filename» Field is in OEM charset corresponding to system locale.
4. Assume «Filename» Field is in UTF-8.

p7zip with oemcp patch (https://github.com/unxed/oemcp/) uses exactly this 
method, and is able to process all zip files in my test set correctly (my test 
set contains several zips generated by different packers on windows, macos, 
linux, and by online services). The same algorithm should be used in any zip 
unpacker wishing to process non-latin filenames as gently as possible.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41928>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to