What does sanitizepart do with newlines \n \r \r\n in filenames? Are these control characters?
What does sanitizepart do with a leading slash? assert os.path.join("a", "/b") == "/b" A new safejoin() or joinsafe() or join(safe='True') could call sanitizepart() such that: assert joinsafe("a\n", "/b") == "a\\n/b" On Mon, May 11, 2020, 1:11 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote: > On May 11, 2020, at 00:40, Steve Jorgensen <ste...@stevej.name> wrote: > > > > Proposal: > > > > Add a new function (possibly `os.path.sanitizepart`) to sanitize a value > for use as a single component of a path. In the default case, the value > must also not be a reference to the current or parent directory ("." or > "..") and must not contain control characters. > > “Also” in addition to what? Are there other requirements enforced besides > these two that aren’t specified anywhere? > > If not: the result can contain the path separator, illegal characters that > aren’t control characters, nonprinting characters that aren’t control > characters, and characters whose bytes (in the filesystem’s encoding) are > ASCII control characters? > > And it can be a reserved name, or even something like C:; as long as it’s > not the Unix . or ..? > > What’s the use case where you need to sanitize these things but nothing > else? As I said on the previous proposal, I have had a variety of times > where I needed to sanitize filenames, but I don’t think this would have > been what I wanted for _any_ of them, much less for most. > > Are there existing tools, libraries, recommendations, etc. that this is > based on, or is it just an educated guess at what’s important? For > something that’s meant to go into the stdlib with a name that strongly > implies “if you use this, you’re safe from stupid or malicious filenames”, > it would be misleading, and possibly dangerous, if it didn’t actually make > you safe because it didn’t catch common mistakes/exploits that everyone > else considers important to catch. And without any cites to what people > everyone else considers important, why should anyone trust that this > proposal isn’t missing, or getting wrong, anything critical? > > Why isn’t this also available in pathlib? Is it the kind of thing you > don’t envision high-level pathlib-style code ever needing to do, only > low-level os-style code? > > > When `replace` is supplied, it is used as a replacement for any invalid > characters or for the first character of an invalid name. When `prefix` is > not also supplied, this is also used as the replacement for the first > character of the name if it is invalid, not simply due to containing > invalid characters. > > What’s the use case for separate prefix and replace? Or just for prefix in > the first place? > > > When `escape` is supplied (typically "%") it is used as the escape > character in the same way that "%" is used in URL encoding. > > Why allow other escape strings? Has anyone ever wanted URL-encoding but > with some other string in place or %, in this or any other context? > > The escape character is not itself escaped? > > More generally, what’s the use case for %-encoding filenames like this? > Are people expecting it to interact transparently with URLs, so if I save a > file “spam\0eggs” in a Python script and then try to browse to > file:///spam\0eggs” in a browser, the browser will convert the \0 character > to %00 the same way my Python script did and therefore find the file? If > so, doesn’t it need to escape all the same characters that URLs do, not a > different set? If not, isn’t using something similar to URL-encoding but > not identical just going to confuse people rather than help then? > > What happens if you supply a string longer than one character as escape? > Or replace or prefix, for that matter? > > Overall, it seems like there is a problem to be solved, but I don’t see > any reason to be confident that this is the solution for anyone, and if > it’s not the solution for _most_ people, adding it to the stdlib will just > mean people don’t search for and find the right one, all the while > misleading themselves into thinking they’re safe when they’re not, which > will make the overall problem worse, not better. > > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/ZBBMQ34OHSR3RYKVUFLNUIM34WG3R2N7/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QDNKK3HSRWRESVF7UBOTEDL6FDUIZW43/ Code of Conduct: http://python.org/psf/codeofconduct/