[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

Mike Lissner Thu, 06 May 2021 13:39:34 -0700


Mike Lissner <mliss...@michaeljaylissner.com> added the comment:


>  With the fix for this bug, urlsplit silently removes (some of) those 
> characters before we can replace them, modifying the output of our 
> sanitisation code

I don't have any good solutions for 3.9.5, but going forward, this feels like 
another example of why we should just do parsing right (the way browsers do). 
That'd maintain tabs and whatnot in your output, and it'd fix the security 
issue by putting `java\nscript` into the scheme attribute instead of the path.

> One solution that presents itself to me: add a `strip_insecure_characters: 
> bool = True` parameter.

Doesn't this lose sight of what this tool is supposed to do? It's not supposed 
to have a good (new, correct) and a bad (old, obsolete) way of parsing. Woe 
unto whoever has to write the documentation for that parameter. 

Also, I should reiterate that these aren't "insecure" characters so if we did 
have a parameter for this, it'd be more like `do_rfc_3986_parsing` or maybe 
`do_naive_parsing`. The chars aren't insecure in themselves. They're fine. 
Python just gets tripped up on them.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43882>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

Reply via email to