Re: Loosening mailcap filename sanitization to allow unicode filenames

2021-05-02 Thread John Hawkinson
Sorry, I wasn't paying attention earlier in the thread.

> It sounds like there may be (or have been) a distinction between native app
> behavior vs unix/shell on MacOS.  But I'm not in a position to test that.

That's correct. The general guidance is it's a good idea to avoid : in MacOS 
filenames, because of this, but there's no hard prohibition nor, I think, a 
security issue. Generally speaking GUI apps that use the system GUI routines 
will exchange a : in a filename for a /, and vice-versa. For instance, if I 
create a file called "a:a"

bash$ touch a:a

then the Finder (MacOS's file browser, equivalent to the Windows Explorer, 
etc.) displays the file as "a/a".

This presents a lot of confusion and is good reason for users to avoid either 
character in the contexts in which it is permitted, but not, I think, an issue 
mutt needs to worry about overmuch.

> Cameron (or anyone else on MacOS), can you confirm ':' is allowed for files
> on your system, especially those created from a shell or launched by Mutt
> via mailcap?

Confirmed.

And this is not a recent change. It has been ever thus since MacOS X, the 
Unix-based MacOS was introduced.

--
jh...@alum.mit.edu
John Hawkinson




Re: Loosening mailcap filename sanitization to allow unicode filenames

2021-05-02 Thread Kevin J. McCarthy

On Sun, May 02, 2021 at 02:05:27AM +0200, Vincent Lefevre wrote:

Old MacOS versions seem to forbid ":" (colon):

 
https://apple.stackexchange.com/questions/173529/when-did-the-colon-character-become-an-allowed-character-in-the-filesystem

But see also

 https://stackoverflow.com/a/31976060/3782797


Hmmm... if the first link is correct I don't think this is something to 
be concerned about.  Wikipedia says "Mac OS X 10.0 was released on March 
24, 2001".


It sounds like there may be (or have been) a distinction between native 
app behavior vs unix/shell on MacOS.  But I'm not in a position to test 
that.


Cameron (or anyone else on MacOS), can you confirm ':' is allowed for 
files on your system, especially those created from a shell or launched 
by Mutt via mailcap?



There may be security issues with some programs. The following one
is fixed, but there may still be similar ones around for programs
written in Perl.

 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=920269


Wow!  Well that's just terrifying.  I'm pretty sure I've taken advantage 
of that shortcut in the past and had no idea it would re-evaluate the 
filename.  A very good argument to keep the sanitizer in place.  Thank 
you again.


--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C  5308 ADEF 7684 8031 6BDA


signature.asc
Description: PGP signature


Re: Loosening mailcap filename sanitization to allow unicode filenames

2021-05-01 Thread Vincent Lefevre
On 2021-04-29 18:24:30 -0700, Kevin J. McCarthy wrote:
> Thanks Vincent.  I'm reluctant to make changes to the sanitizer without
> reports of issues.

Old MacOS versions seem to forbid ":" (colon):

  
https://apple.stackexchange.com/questions/173529/when-did-the-colon-character-become-an-allowed-character-in-the-filesystem

But see also

  https://stackoverflow.com/a/31976060/3782797

and the latest comments:

  On MacOS, the only forbidden printable ASCII character is :. Using
  the Windows superset of forbidden characters is sensible because it
  covers Linux and MacOS too. – AlainD Feb 10 at 12:27

  I just confirmed @AlainD's comment. The only character I wasn't
  allowed to name my file is the colon character. However the reason
  for investigating was because I received a file from a windows user
  with a colon in it's name. – Dark Star1 Mar 3 at 11:41

> The Mutt code wraps the expandos in single quotes, and
> launches it via /bin/sh.  So, I'm not really clear about the purpose of the
> sanitizer, unless it was to help protect against misuse by the mailcap
> program invoked.

There may be security issues with some programs. The following one
is fixed, but there may still be similar ones around for programs
written in Perl.

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=920269

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: Loosening mailcap filename sanitization to allow unicode filenames

2021-04-29 Thread Kevin J. McCarthy

On Thu, Apr 29, 2021 at 09:25:39PM +0200, Vincent Lefevre wrote:

Concerning the ASCII characters, note that "{" and "}" may have a
special meaning for the shell.



And ":" may have a special interpretation in some applications
(e.g. scp, but maybe applications that support URL arguments).

I also wonder whether "%" should be forbidden, unless you know that
it comes from percent-encoding.


Thanks Vincent.  I'm reluctant to make changes to the sanitizer without 
reports of issues.  The Mutt code wraps the expandos in single quotes, 
and launches it via /bin/sh.  So, I'm not really clear about the purpose 
of the sanitizer, unless it was to help protect against misuse by the 
mailcap program invoked.


But I'm also not ready to remove the sanitizer either, because there 
isn't any historical justification that I can find, and I'm absolutely 
*not* a security person myself... :-/



Note also that "-" should not be used as the first character of
a filename, otherwise it could be confused with an option.


Fortunately, the filename is appended to the tmpdir, so a leading hyphen 
should be okay.


--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C  5308 ADEF 7684 8031 6BDA


signature.asc
Description: PGP signature


Re: Loosening mailcap filename sanitization to allow unicode filenames

2021-04-29 Thread Vincent Lefevre
On 2021-04-27 10:17:29 -0700, Kevin J. McCarthy wrote:
> Ticket 351 on gitlab (https://gitlab.com/muttmua/mutt/-/issues/351) noted
> that an attachment 中文名称.txt, when launched via a mailcap viewer, created
> a tempfile ".txt".
> 
> This is because of the sanitize_filename() functions, which have an
> allow-list of
> "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+@{}._-:%/"
> (with the '/' disabled for filenames).

Concerning the ASCII characters, note that "{" and "}" may have a
special meaning for the shell. In zsh:

zira% echo {a..d}
a b c d

And ":" may have a special interpretation in some applications
(e.g. scp, but maybe applications that support URL arguments).

I also wonder whether "%" should be forbidden, unless you know that
it comes from percent-encoding.

Note also that "-" should not be used as the first character of
a filename, otherwise it could be confused with an option.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


Re: Loosening mailcap filename sanitization to allow unicode filenames

2021-04-27 Thread Kevin J. McCarthy

On Tue, Apr 27, 2021 at 06:16:59PM -0700, Kevin J. McCarthy wrote:
Answering the question first, Mutt will decode the RFC2231 parameters 
and convert them to $charset.  It looks like the 2231 decoder does run 
values through mutt_filter_unprintable().


I'm looking closer at the 2231 decoder and it actually looks like it
doesn't always do this.  I think that was a mistake, and I'll add a fix.

--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C  5308 ADEF 7684 8031 6BDA


signature.asc
Description: PGP signature


Re: Loosening mailcap filename sanitization to allow unicode filenames

2021-04-27 Thread Kevin J. McCarthy

On Wed, Apr 28, 2021 at 01:44:03AM +0200, Eike Rathke wrote:

On Tuesday, 2021-04-27 10:17:29 -0700, Kevin J. McCarthy wrote:


What if we added an allow_8bit parameter to the function, that also passed
through bytes with the 8th bit set?  I'd keep this set off in all other
invocations except the mailcap invocations.


Allowing *all* 8-bit may be ill advised. I'd disallow at least resulting
U+0080 to U+009F Unicode control characters (C1 control codes). Also
exclude the non-characters U+FFFE and U+. But, what text encoding
are we actually talking about?


Answering the question first, Mutt will decode the RFC2231 
parameters and convert them to $charset.  It looks like the 2231 decoder 
does run values through mutt_filter_unprintable().


mutt_filter_unprintable() calls mbrtowc() to check for a valid sequence 
and then iswprint() to ensure the character is in the "print" class.


For the first part, I *think* that addresses your concern.  But in any 
case, would those characters cause issues for a tempfile filename?


--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C  5308 ADEF 7684 8031 6BDA


signature.asc
Description: PGP signature


Re: Loosening mailcap filename sanitization to allow unicode filenames

2021-04-27 Thread Kevin J. McCarthy

On Wed, Apr 28, 2021 at 08:48:04AM +1000, Cameron Simpson wrote:

On 27Apr2021 10:17, Kevin J. McCarthy  wrote:
First remark:

I think we should make clear that this only makes sense when you're
encoding filenames as UTF-8, where all multibyte sequences have a high
bit set. This isn't necessarily the case with other encodings.


I was under the impression other encodings at least shared the standard 
printable ascii 7-bit characters, and that other characters (whether 
single or multi-byte) had the high bit set to distinguish them.


However, charsets are definitely an area I'm weak on.


Second remark:

As one who has long been less than enthused by sanitising filenames,
what exactly are we trying to accomplish when we sanitise a filename?

- avoid trickiness like whitespace and quote characters, which cause a
 little pain for users of the files in scripting settings?

- avoiding $ and ` et al, which cause hazards for the very careless
 script author? (but inly if injected blindly)

- avoiding other shell punctuation like redirections? same issue

- avoiding escape paths such as absolute paths (/etc/passwd, oh root-run
 mutt user?) or ../blah to get out of the scratch area?

Without qualifying these objectives, "sanitisation" means little (or too
much, depending where you stand).


If I had to guess, I would say all of the above.

But, unfortunately the code and sanitization is 20+ years old, so I 
think you'd have to check with Michael or Thomas to get the exact 
details of why. :-) They did leave a big bold faced warning in the 
$mailcap_sanitize manual documentation not to turn it off though.



On the one hand these are temp files, but Mutt already tries to
preserve the filename to make for a nicer user interaction.  It seems
if we can preserve unicode filenames better we ought to do that too.


"Unicode filenames" isn't a meaningful term in UNIX, as the API is C
strings - byte sequences with NUL terminators. I suspect you mean "UTF-8
encoded names", which is the common modern default.


Sorry for my sloppiness.  I should have said to "preserve high-bit bytes 
in filenames."  The filename parameter arrives in RFC2231 encoding, 
which includes a charset parameter.  Mutt will decode and convert this 
value to the machine charset (or $charset if that's specified).


--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C  5308 ADEF 7684 8031 6BDA


signature.asc
Description: PGP signature


Re: Loosening mailcap filename sanitization to allow unicode filenames

2021-04-27 Thread Eike Rathke
Hi Kevin,

On Tuesday, 2021-04-27 10:17:29 -0700, Kevin J. McCarthy wrote:

> What if we added an allow_8bit parameter to the function, that also passed
> through bytes with the 8th bit set?  I'd keep this set off in all other
> invocations except the mailcap invocations.

Allowing *all* 8-bit may be ill advised. I'd disallow at least resulting
U+0080 to U+009F Unicode control characters (C1 control codes). Also
exclude the non-characters U+FFFE and U+. But, what text encoding
are we actually talking about? 8bit suggests UTF-8 encoding or
ISO-8859-... code pages (not assuming EBCDIC ;P). I don't see that
mutt_buffer_sanitize_filename() or its calling contexts have any notion
about text encoding. Assuming UTF-8 (as path names are supposed to be),
check for valid sequences?

  Eike

-- 
OpenPGP/GnuPG encrypted mail preferred in all private communication.
GPG key 0x6A6CD5B765632D3A - 2265 D7F3 A7B0 95CC 3918  630B 6A6C D5B7 6563 2D3A
Use LibreOffice! https://www.libreoffice.org/


signature.asc
Description: PGP signature


Re: Loosening mailcap filename sanitization to allow unicode filenames

2021-04-27 Thread Cameron Simpson
On 27Apr2021 10:17, Kevin J. McCarthy  wrote:
>Ticket 351 on gitlab (https://gitlab.com/muttmua/mutt/-/issues/351) 
>noted that an attachment 中文名称.txt, when launched via a mailcap 
>viewer, created a tempfile ".txt".

Ouch.

>This is because of the sanitize_filename() functions, which have an 
>allow-list of 
>"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+@{}._-:%/" 
>(with the '/' disabled for filenames).
>
>I'd be reluctant to change sanitization for the %{} or %t 
>expandos, but this does seem to be a bit strict for the filename. 
>Oswald notes in the ticket that 8-bit characters are harmless at the 
>system level (Oswald, feel free to reply/clarify - I'm not trying to 
>put words in your mouth).

First remark:

I think we should make clear that this only makes sense when you're 
encoding filenames as UTF-8, where all multibyte sequences have a high 
bit set. This isn't necessarily the case with other encodings.

Second remark:

As one who has long been less than enthused by sanitising filenames, 
what exactly are we trying to accomplish when we sanitise a filename?

- avoid trickiness like whitespace and quote characters, which cause a 
  little pain for users of the files in scripting settings?

- avoiding $ and ` et al, which cause hazards for the very careless 
  script author? (but inly if injected blindly)

- avoiding other shell punctuation like redirections? same issue

- avoiding escape paths such as absolute paths (/etc/passwd, oh root-run 
  mutt user?) or ../blah to get out of the scratch area?

Without qualifying these objectives, "sanitisation" means little (or too 
much, depending where you stand).

>On the one hand these are temp files, but Mutt already tries to 
>preserve the filename to make for a nicer user interaction.  It seems 
>if we can preserve unicode filenames better we ought to do that too.

"Unicode filenames" isn't a meaningful term in UNIX, as the API is C 
strings - byte sequences with NUL terminators. I suspect you mean "UTF-8 
encoded names", which is the common modern default.

>What if we added an allow_8bit parameter to the function, that also 
>passed through bytes with the 8th bit set?  I'd keep this set off in 
>all other invocations except the mailcap invocations.

Of course, the trickiness is that header things like filenames are, 
IIRC, "bytes". Without a charset, do we inherently know anything about 
them _as characters_?

I'm +1 for allow_8bit if we make it clear in the docs (and implemented 
it correctly in the code) that this refers to the in-filesystem byte 
encoding of the filename. _Not_ hypothetical "Unicode". One person's 
"Unicode" is another's Shift-JIS :-)

https://en.wikipedia.org/wiki/Mojibake

One has but to shift one's shell locale to see this play out.

Cheers,
Cameron Simpson 


Loosening mailcap filename sanitization to allow unicode filenames

2021-04-27 Thread Kevin J. McCarthy
Ticket 351 on gitlab (https://gitlab.com/muttmua/mutt/-/issues/351) 
noted that an attachment 中文名称.txt, when launched via a mailcap 
viewer, created a tempfile ".txt".


This is because of the sanitize_filename() functions, which have an 
allow-list of 
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+@{}._-:%/" 
(with the '/' disabled for filenames).


I'd be reluctant to change sanitization for the %{} or %t 
expandos, but this does seem to be a bit strict for the filename. 
Oswald notes in the ticket that 8-bit characters are harmless at the 
system level (Oswald, feel free to reply/clarify - I'm not trying to put 
words in your mouth).


On the one hand these are temp files, but Mutt already tries to preserve 
the filename to make for a nicer user interaction.  It seems if we can 
preserve unicode filenames better we ought to do that too.


What if we added an allow_8bit parameter to the function, that also 
passed through bytes with the 8th bit set?  I'd keep this set off in all 
other invocations except the mailcap invocations.


Comments?

--
Kevin J. McCarthy
GPG Fingerprint: 8975 A9B3 3AA3 7910 385C  5308 ADEF 7684 8031 6BDA


signature.asc
Description: PGP signature