On Wed, Apr 28, 2021 at 08:48:04AM +1000, Cameron Simpson wrote:
On 27Apr2021 10:17, Kevin J. McCarthy <ke...@8t8.us> wrote: First remark:I think we should make clear that this only makes sense when you're encoding filenames as UTF-8, where all multibyte sequences have a high bit set. This isn't necessarily the case with other encodings.
I was under the impression other encodings at least shared the standard printable ascii 7-bit characters, and that other characters (whether single or multi-byte) had the high bit set to distinguish them.
However, charsets are definitely an area I'm weak on.
Second remark: As one who has long been less than enthused by sanitising filenames, what exactly are we trying to accomplish when we sanitise a filename? - avoid trickiness like whitespace and quote characters, which cause a little pain for users of the files in scripting settings? - avoiding $ and ` et al, which cause hazards for the very careless script author? (but inly if injected blindly) - avoiding other shell punctuation like redirections? same issue - avoiding escape paths such as absolute paths (/etc/passwd, oh root-run mutt user?) or ../blah to get out of the scratch area? Without qualifying these objectives, "sanitisation" means little (or too much, depending where you stand).
If I had to guess, I would say all of the above.But, unfortunately the code and sanitization is 20+ years old, so I think you'd have to check with Michael or Thomas to get the exact details of why. :-) They did leave a big bold faced warning in the $mailcap_sanitize manual documentation not to turn it off though.
On the one hand these are temp files, but Mutt already tries to preserve the filename to make for a nicer user interaction. It seems if we can preserve unicode filenames better we ought to do that too."Unicode filenames" isn't a meaningful term in UNIX, as the API is C strings - byte sequences with NUL terminators. I suspect you mean "UTF-8 encoded names", which is the common modern default.
Sorry for my sloppiness. I should have said to "preserve high-bit bytes in filenames." The filename parameter arrives in RFC2231 encoding, which includes a charset parameter. Mutt will decode and convert this value to the machine charset (or $charset if that's specified).
-- Kevin J. McCarthy GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA
signature.asc
Description: PGP signature