On May 11, 2020, at 00:40, Steve Jorgensen <ste...@stevej.name> wrote: > > Proposal: > > Add a new function (possibly `os.path.sanitizepart`) to sanitize a value for > use as a single component of a path. In the default case, the value must also > not be a reference to the current or parent directory ("." or "..") and must > not contain control characters.
“Also” in addition to what? Are there other requirements enforced besides these two that aren’t specified anywhere? If not: the result can contain the path separator, illegal characters that aren’t control characters, nonprinting characters that aren’t control characters, and characters whose bytes (in the filesystem’s encoding) are ASCII control characters? And it can be a reserved name, or even something like C:; as long as it’s not the Unix . or ..? What’s the use case where you need to sanitize these things but nothing else? As I said on the previous proposal, I have had a variety of times where I needed to sanitize filenames, but I don’t think this would have been what I wanted for _any_ of them, much less for most. Are there existing tools, libraries, recommendations, etc. that this is based on, or is it just an educated guess at what’s important? For something that’s meant to go into the stdlib with a name that strongly implies “if you use this, you’re safe from stupid or malicious filenames”, it would be misleading, and possibly dangerous, if it didn’t actually make you safe because it didn’t catch common mistakes/exploits that everyone else considers important to catch. And without any cites to what people everyone else considers important, why should anyone trust that this proposal isn’t missing, or getting wrong, anything critical? Why isn’t this also available in pathlib? Is it the kind of thing you don’t envision high-level pathlib-style code ever needing to do, only low-level os-style code? > When `replace` is supplied, it is used as a replacement for any invalid > characters or for the first character of an invalid name. When `prefix` is > not also supplied, this is also used as the replacement for the first > character of the name if it is invalid, not simply due to containing invalid > characters. What’s the use case for separate prefix and replace? Or just for prefix in the first place? > When `escape` is supplied (typically "%") it is used as the escape character > in the same way that "%" is used in URL encoding. Why allow other escape strings? Has anyone ever wanted URL-encoding but with some other string in place or %, in this or any other context? The escape character is not itself escaped? More generally, what’s the use case for %-encoding filenames like this? Are people expecting it to interact transparently with URLs, so if I save a file “spam\0eggs” in a Python script and then try to browse to file:///spam\0eggs” in a browser, the browser will convert the \0 character to %00 the same way my Python script did and therefore find the file? If so, doesn’t it need to escape all the same characters that URLs do, not a different set? If not, isn’t using something similar to URL-encoding but not identical just going to confuse people rather than help then? What happens if you supply a string longer than one character as escape? Or replace or prefix, for that matter? Overall, it seems like there is a problem to be solved, but I don’t see any reason to be confident that this is the solution for anyone, and if it’s not the solution for _most_ people, adding it to the stdlib will just mean people don’t search for and find the right one, all the while misleading themselves into thinking they’re safe when they’re not, which will make the overall problem worse, not better. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZBBMQ34OHSR3RYKVUFLNUIM34WG3R2N7/ Code of Conduct: http://python.org/psf/codeofconduct/