https://bugs.kde.org/show_bug.cgi?id=378904

--- Comment #5 from Ragnar Thomsen <rthoms...@gmail.com> ---
I tried using KEncodingParser with the libzip-plugin to open the attached
Japanese zip archive. It seems like it could correctly detect the encoding for
all the files (see attached screenshot), so this seems like a promising
approach.
I also tried using the uchardet library but it detected ASCII encoding for all
the files.

One concern is the overhead of probing for the encoding of each archive entry.
Opening the linux kernel source in zip format took 106 secs with probing vs 5
secs without, so there is significant overhead to this approach.
I think we either need to be smart and only probe when needed (can't see how
though) or we add a menu item in the GUI to reload the archive with probing of
filename encodings. If we could assume that all archive entries have the same
encoding, we could only probe the first entry, but I think this assumption
doesn't hold in real life, e.g. in the attached archive the first entry is
detected as UTF8 since it doesn't contain Japanese characters.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to