> On May 11, 2020, at 14:18, Steve Jorgensen <ste...@stevej.name> wrote:
> 
> Andrew Barnert wrote:
>>> On May 11, 2020, at 00:40, Steve Jorgensen ste...@stevej.name wrote:
>>> Proposal:
>>> Add a new function (possibly os.path.sanitizepart) to sanitize a value for
>>> use as a single component of a path. In the default case, the value must 
>>> also not be a
>>> reference to the current or parent directory ("." or "..") and must not 
>>> contain control
>>> characters.
> <snip>
>> If not: the result can contain the path separator, illegal characters that 
>> aren’t
>> control characters, nonprinting characters that aren’t control characters, 
>> and characters
>> whose bytes (in the filesystem’s encoding) are ASCII control characters?
>> And it can be a reserved name, or even something like C:; as long as it’s 
>> not the Unix
>> . or ..?
> 
> Are there non-printing characters outside of those in the Unicode general 
> category of "C" that make sense to omit?

Off the top of my head, everything in the Z category (like U+2029 PARAGRAPH 
SEPARATOR) is non-printable, and makes sense to sanitize.

Meanwhile, what about invalid characters being smuggled through str by 
surrogate_escape? I don’t know if those are printable, or what category they 
are… or whether you want to sanitize them, for that matter, so I have no idea 
if this rule does the right thing or not.

More generally, we shouldn’t be relying on what respondents know off the top of 
their heads in the first place for something that people are going to rely on 
for security/safety purposes.

> Regarding names like "C:", you are absolutely right to point that out. When 
> the platform is Windows, certainly, "<letter>:" should not be allowed, and 
> perhaps colon should not be allowed at all. I'll need to research that a bit. 
> This matters because if the path part is used without explicit "./" prefixed 
> to it, then it will refer to a root path,

The name `C:spam` means spam in the current directory for the C drive—which 
isn’t the same as the current working directory unless C is the current working 
drive, but it’s definitely not (in general) the same as the root.

And what about all the other questions I asked?

Most importantly, you need to clarify what the use case is, and why this 
proposal meets it. Otherwise, it sounds more like a trap to make people think 
their code is safe when it isn’t, not a fix for the real problem.

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/XRFT77TXLE7MNAP2MV2IC57NG4EWQIGP/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to