https://bugs.kde.org/show_bug.cgi?id=378904
Alexander Trufanov <trufano...@gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |trufano...@gmail.com --- Comment #3 from Alexander Trufanov <trufano...@gmail.com> --- As I found out this is a very old problem with roots in ZIP specification. ZIP can contain non-UTF filenames, UTF-8 filenames, or non-UTF filenames with additional field that contain UTF-8 filename (since 2007). Same isapplied to ZIP archive commentary. The problem is that by design the non-UTF charset is IBM 437 charset which does not support non-Western languages. On practice Windows encode filenames with one of its DOS charsets (CP*), for example for Russian it'll be CP866 (IBM 866). And there is no field in ZIP to specify which exactly charset was used. Even worse the fact that by default many Windows achievers don't use UTF-8 encoding but this DOS one. As I understand ZIP authors don't want to fix this and suggesting everyone to switch to UTF-8 for non-English systems. There are several different patches, libs, tools proposed by developers to workaround the problem decade ago. Also maintainers of some linux systems patch zip/unzip tools in their systems to workaround that. For example, here is discussion about unzip patch for Ubuntu systems: https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/580961/ Which end up with a patch that has been accepted for Ubuntu main branch. But this take years. I think this is a mirror of this patch: https://github.com/zip-i18n/unzip/blob/master/debian/patches/20-unzip60-alt-iconv-utf8 As I can see from code they request locale from system and try to match it with DOS charset based on hardcoded table. And additionally provide command line args to allow user to specify the filename encoding by himself. I would say their predefined encoding list is rather small and oriented to russian-speakers. Or perhaps that's a wrong patch. Anyway still no GUI archivers implemented something like that. I don't believe much in automatic encoding detection. At least if one not bet on fact that all non-UTF encodings coming from Win shall be CPxxx, and not Windows-12xx. Bcs even for russian there are 4-5 charsets and some of them very hard to distinguish without a dictionary or text statistics. So it may be a heuristic but not 100% reliable method. But I think Ark can do something like Ubuntu's unzip have: 1. A small prebuilt table to match current locale to encoding supposed to come from Win-created ZIPs (like here: https://github.com/zip-i18n/unzip/blob/master/debian/patches/20-unzip60-alt-iconv-utf8#L36) in assumption that Linux and Windows users spoke same language. 2. Ark can copy-paste cool menu from Kate (Tools/Encodings) that will let user switch to one of encodings available in his system in GUI. And use this choice to display filenames and archive commentaries in GUI as well as for I/O operations while extracting files. This will allow user to find proper charset and get files extracted. -- You are receiving this mail because: You are watching all bug changes.