On May 11, 2020, at 00:40, Steve Jorgensen <ste...@stevej.name> wrote:
> 
> Proposal:
> 
> Add a new function (possibly `os.path.sanitizepart`) to sanitize a value for 
> use as a single component of a path. In the default case, the value must also 
> not be a reference to the current or parent directory ("." or "..") and must 
> not contain control characters.

“Also” in addition to what? Are there other requirements enforced besides these 
two that aren’t specified anywhere?

If not: the result can contain the path separator, illegal characters that 
aren’t control characters, nonprinting characters that aren’t control 
characters, and characters whose bytes (in the filesystem’s encoding) are ASCII 
control characters?

And it can be a reserved name, or even something like C:; as long as it’s not 
the Unix . or ..?

What’s the use case where you need to sanitize these things but nothing else? 
As I said on the previous proposal, I have had a variety of times where I 
needed to sanitize filenames, but I don’t think this would have been what I 
wanted for _any_ of them, much less for most.

Are there existing tools, libraries, recommendations, etc. that this is based 
on, or is it just an educated guess at what’s important? For something that’s 
meant to go into the stdlib with a name that strongly implies  “if you use 
this, you’re safe from stupid or malicious filenames”, it would be misleading, 
and possibly dangerous, if it didn’t actually make you safe because it didn’t 
catch common mistakes/exploits that everyone else considers important to catch. 
And without any cites to what people everyone else considers important, why 
should anyone trust that this proposal isn’t missing, or getting wrong, 
anything critical?

Why isn’t this also available in pathlib? Is it the kind of thing you don’t 
envision high-level pathlib-style code ever needing to do, only low-level 
os-style code?

> When `replace` is supplied, it is used as a replacement for any invalid 
> characters or for the first character of an invalid name. When `prefix` is 
> not also supplied, this is also used as the replacement for the first 
> character of the name if it is invalid, not simply due to containing invalid 
> characters.

What’s the use case for separate prefix and replace? Or just for prefix in the 
first place?

> When `escape` is supplied (typically "%") it is used as the escape character 
> in the same way that "%" is used in URL encoding.

Why allow other escape strings? Has anyone ever wanted URL-encoding but with 
some other string in place or %, in this or any other context?

The escape character is not itself escaped?

More generally, what’s the use case for %-encoding filenames like this? Are 
people expecting it to interact transparently with URLs, so if I save a file 
“spam\0eggs” in a Python script and then try to browse to file:///spam\0eggs” 
in a browser, the browser will convert the \0 character to %00 the same way my 
Python script did and therefore find the file? If so, doesn’t it need to escape 
all the same characters that URLs do, not a different set? If not, isn’t using 
something similar to URL-encoding but not identical just going to confuse 
people rather than help then?

What happens if you supply a string longer than one character as escape? Or 
replace or prefix, for that matter?

Overall, it seems like there is a problem to be solved, but I don’t see any 
reason to be confident that this is the solution for anyone, and if it’s not 
the solution for _most_ people, adding it to the stdlib will just mean people 
don’t search for and find the right one, all the while misleading themselves 
into thinking they’re safe when they’re not, which will make the overall 
problem worse, not better.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/ZBBMQ34OHSR3RYKVUFLNUIM34WG3R2N7/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to