[Python-ideas] Re: Sanitize filename (path part) 2nd try

Wes Turner Mon, 11 May 2020 12:55:43 -0700

What does sanitizepart do with newlines \n \r \r\n in filenames? Are these
control characters?


What does sanitizepart do with a leading slash?

assert os.path.join("a", "/b") == "/b"

A new safejoin() or joinsafe() or join(safe='True') could call
sanitizepart() such that:

assert joinsafe("a\n", "/b") == "a\\n/b"

On Mon, May 11, 2020, 1:11 PM Andrew Barnert via Python-ideas <
python-ideas@python.org> wrote:

> On May 11, 2020, at 00:40, Steve Jorgensen <ste...@stevej.name> wrote:
> >
> > Proposal:
> >
> > Add a new function (possibly `os.path.sanitizepart`) to sanitize a value
> for use as a single component of a path. In the default case, the value
> must also not be a reference to the current or parent directory ("." or
> "..") and must not contain control characters.
>
> “Also” in addition to what? Are there other requirements enforced besides
> these two that aren’t specified anywhere?
>
> If not: the result can contain the path separator, illegal characters that
> aren’t control characters, nonprinting characters that aren’t control
> characters, and characters whose bytes (in the filesystem’s encoding) are
> ASCII control characters?
>
> And it can be a reserved name, or even something like C:; as long as it’s
> not the Unix . or ..?
>
> What’s the use case where you need to sanitize these things but nothing
> else? As I said on the previous proposal, I have had a variety of times
> where I needed to sanitize filenames, but I don’t think this would have
> been what I wanted for _any_ of them, much less for most.
>
> Are there existing tools, libraries, recommendations, etc. that this is
> based on, or is it just an educated guess at what’s important? For
> something that’s meant to go into the stdlib with a name that strongly
> implies  “if you use this, you’re safe from stupid or malicious filenames”,
> it would be misleading, and possibly dangerous, if it didn’t actually make
> you safe because it didn’t catch common mistakes/exploits that everyone
> else considers important to catch. And without any cites to what people
> everyone else considers important, why should anyone trust that this
> proposal isn’t missing, or getting wrong, anything critical?
>
> Why isn’t this also available in pathlib? Is it the kind of thing you
> don’t envision high-level pathlib-style code ever needing to do, only
> low-level os-style code?
>
> > When `replace` is supplied, it is used as a replacement for any invalid
> characters or for the first character of an invalid name. When `prefix` is
> not also supplied, this is also used as the replacement for the first
> character of the name if it is invalid, not simply due to containing
> invalid characters.
>
> What’s the use case for separate prefix and replace? Or just for prefix in
> the first place?
>
> > When `escape` is supplied (typically "%") it is used as the escape
> character in the same way that "%" is used in URL encoding.
>
> Why allow other escape strings? Has anyone ever wanted URL-encoding but
> with some other string in place or %, in this or any other context?
>
> The escape character is not itself escaped?
>
> More generally, what’s the use case for %-encoding filenames like this?
> Are people expecting it to interact transparently with URLs, so if I save a
> file “spam\0eggs” in a Python script and then try to browse to
> file:///spam\0eggs” in a browser, the browser will convert the \0 character
> to %00 the same way my Python script did and therefore find the file? If
> so, doesn’t it need to escape all the same characters that URLs do, not a
> different set? If not, isn’t using something similar to URL-encoding but
> not identical just going to confuse people rather than help then?
>
> What happens if you supply a string longer than one character as escape?
> Or replace or prefix, for that matter?
>
> Overall, it seems like there is a problem to be solved, but I don’t see
> any reason to be confident that this is the solution for anyone, and if
> it’s not the solution for _most_ people, adding it to the stdlib will just
> mean people don’t search for and find the right one, all the while
> misleading themselves into thinking they’re safe when they’re not, which
> will make the overall problem worse, not better.
>
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/ZBBMQ34OHSR3RYKVUFLNUIM34WG3R2N7/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QDNKK3HSRWRESVF7UBOTEDL6FDUIZW43/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Sanitize filename (path part) 2nd try

Reply via email to