---------- Forwarded Message --------- From: Eugene Roshal <ros...@rarlab.com> Subject: Re: Fwd: Bug#948108: unrar corrupts filenames given as arguments Date: Jan 4 2020, at 8:35 am To: Martin Meredith <mar...@sourceguru.net>
Hello, RAR expects source parameters in local encoding, but converts them to wchar_t with CharToWide function and uses wchar_t almost everywhere internally. RAR has a feature allowing to archive and extract names not belonging to current locale, such as extended ASCII instead of UTF-8. When RAR CharToWide function notices names which cannot be correctly converted by mbsrtowcs, it calls CharToWideMap to perform per byte conversion and sets the special flag (0xFFFE noncharacter) to tell WideToChar to apply per byte decoding WideToCharMap to such name. While it is intended for names read from and saved to archive, here it is applied to command line parameter, resulting in 0xfffe flag and per byte conversion visible on the screen and producing this mangled name. Since source "x\x92.rar" is 7 bytes length, RAR allocates 7 bytes output buffer for converted wchar_t string. CharToWideMap output is longer than that because of special flag inclusion, so RAR successfully truncates output to buffer size. While such source parameter conversion is useless, it is harmless as well. Truncation is a good sign indicating that RAR cares about buffer size and prevents buffer overflow. Mangled name in output is result of garbage in input instead of expected local encoding. So no reason to worry in my opinion. > Obviously, unrar should not mangle filenames, as filenames are > octet-strings, not locale-encoded. Normally RAR expects locale-encoded names here. Eugene