[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

Gregory P. Smith Wed, 05 May 2021 13:26:28 -0700


Gregory P. Smith <[email protected]> added the comment:


Mike: There may be multiple ways to read that WHATWG recommendation?  The 
linked to section is about implementing a state machine for parsing a URL into 
parts safely.  But that may not imply that anything that passed through that 
state machine should be considered "valid".  Just that this spec is able to 
make some sense out of otherwise messy data.  Network adage: Be lenient in what 
you accept.  I doubt anyone would want something producing URLs for consumption 
by something else to allow these in their output.

I have yet to read the _entire_ WHATWG spec from head to toe to try and better 
understand the context they had in mind.

I agree that it is unfortunate that the original URL may have these issues and 
go on to be (re)used in another context where it passes to something that might 
not treat it in the same manner.  That's in some sense a flaw in our API design 
that we allow string based URLs to be used in APIs rather than require them to 
have gone through a parsing sanitization step into a instance of a URL object 
(for example). API design like that is clearly out of scope for this issue. :)

Regardless my gut feeling is that we continue with the existing fix that remove 
the characters as we've already started releasing.  If we need to change our 
mind on how we've done that, so be it, we can, that'll show up in later patches.

----------

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue43882>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43882] [security] urllib.parse should sanitize urls containing ASCII newline and tabs.

Reply via email to