On Wed, Apr 28, 2021 at 08:48:04AM +1000, Cameron Simpson wrote:
On 27Apr2021 10:17, Kevin J. McCarthy <ke...@8t8.us> wrote:
First remark:

I think we should make clear that this only makes sense when you're
encoding filenames as UTF-8, where all multibyte sequences have a high
bit set. This isn't necessarily the case with other encodings.

I was under the impression other encodings at least shared the standard printable ascii 7-bit characters, and that other characters (whether single or multi-byte) had the high bit set to distinguish them.

However, charsets are definitely an area I'm weak on.

Second remark:

As one who has long been less than enthused by sanitising filenames,
what exactly are we trying to accomplish when we sanitise a filename?

- avoid trickiness like whitespace and quote characters, which cause a
 little pain for users of the files in scripting settings?

- avoiding $ and ` et al, which cause hazards for the very careless
 script author? (but inly if injected blindly)

- avoiding other shell punctuation like redirections? same issue

- avoiding escape paths such as absolute paths (/etc/passwd, oh root-run
 mutt user?) or ../blah to get out of the scratch area?

Without qualifying these objectives, "sanitisation" means little (or too
much, depending where you stand).

If I had to guess, I would say all of the above.

But, unfortunately the code and sanitization is 20+ years old, so I think you'd have to check with Michael or Thomas to get the exact details of why. :-) They did leave a big bold faced warning in the $mailcap_sanitize manual documentation not to turn it off though.

On the one hand these are temp files, but Mutt already tries to
preserve the filename to make for a nicer user interaction.  It seems
if we can preserve unicode filenames better we ought to do that too.

"Unicode filenames" isn't a meaningful term in UNIX, as the API is C
strings - byte sequences with NUL terminators. I suspect you mean "UTF-8
encoded names", which is the common modern default.

Sorry for my sloppiness. I should have said to "preserve high-bit bytes in filenames." The filename parameter arrives in RFC2231 encoding, which includes a charset parameter. Mutt will decode and convert this value to the machine charset (or $charset if that's specified).

--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C  5308 ADEF 7684 8031 6BDA

Attachment: signature.asc
Description: PGP signature

Reply via email to