New submission from Ivan Sorokin <ivan.sorokin.t...@gmail.com>:

ZipFile has problems with filename charset in .zip archives having filenames 
charset encoded in OEM code page.

ZipFile assumes that OEM code page always means "cp437". Actually many popular 
.zip packers (for example, Windows internal "zip folders" tool) use OEM code 
page corresponding to system locale to write file names in .zip files.

To read such files correctly we should detect correct OEM code page from system 
locale instead of sticking to cp437.

Here is locale-to-oem-code-page conversion table, generated from Wine source 
code:
https://github.com/unxed/oemcp/blob/master/oemcp.txt

Sample archive is attached. The file inside should be extracted as "Новый 
текстовый документ.txt" when ru_RU system locale is set.

----------
components: Library (Lib)
files: windows_cyrillic.zip
messages: 377932
nosy: ivan.sorokin.tech
priority: normal
severity: normal
status: open
title: Detect OEM code page for zip archives in ZipFile based on system locale
versions: Python 3.10
Added file: https://bugs.python.org/file49492/windows_cyrillic.zip

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41929>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to