> On May 11, 2020, at 14:18, Steve Jorgensen <ste...@stevej.name> wrote: > > Andrew Barnert wrote: >>> On May 11, 2020, at 00:40, Steve Jorgensen ste...@stevej.name wrote: >>> Proposal: >>> Add a new function (possibly os.path.sanitizepart) to sanitize a value for >>> use as a single component of a path. In the default case, the value must >>> also not be a >>> reference to the current or parent directory ("." or "..") and must not >>> contain control >>> characters. > <snip> >> If not: the result can contain the path separator, illegal characters that >> aren’t >> control characters, nonprinting characters that aren’t control characters, >> and characters >> whose bytes (in the filesystem’s encoding) are ASCII control characters? >> And it can be a reserved name, or even something like C:; as long as it’s >> not the Unix >> . or ..? > > Are there non-printing characters outside of those in the Unicode general > category of "C" that make sense to omit?
Off the top of my head, everything in the Z category (like U+2029 PARAGRAPH SEPARATOR) is non-printable, and makes sense to sanitize. Meanwhile, what about invalid characters being smuggled through str by surrogate_escape? I don’t know if those are printable, or what category they are… or whether you want to sanitize them, for that matter, so I have no idea if this rule does the right thing or not. More generally, we shouldn’t be relying on what respondents know off the top of their heads in the first place for something that people are going to rely on for security/safety purposes. > Regarding names like "C:", you are absolutely right to point that out. When > the platform is Windows, certainly, "<letter>:" should not be allowed, and > perhaps colon should not be allowed at all. I'll need to research that a bit. > This matters because if the path part is used without explicit "./" prefixed > to it, then it will refer to a root path, The name `C:spam` means spam in the current directory for the C drive—which isn’t the same as the current working directory unless C is the current working drive, but it’s definitely not (in general) the same as the root. And what about all the other questions I asked? Most importantly, you need to clarify what the use case is, and why this proposal meets it. Otherwise, it sounds more like a trap to make people think their code is safe when it isn’t, not a fix for the real problem. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XRFT77TXLE7MNAP2MV2IC57NG4EWQIGP/ Code of Conduct: http://python.org/psf/codeofconduct/