ready-research <readyresearch...@gmail.com> added the comment: Some other examples to test this behaviour: urlparse('https:/\/\/\www.attacker.com/a/b') urlparse('https:/\www.attacker.com/a/b')
## Comparing it to other languages/runtimes How do other languages and their runtimes work with URL parsing functions? Here's Node.js, also showing that it is missing the `host` and `hostname`, with a similar behavior to the currently reported "buggy" python `urlparse()` one: ``` node >require("url").parse("https:/\/\/\www.attacker.com/a/b"); Will return Url { protocol: 'https:', slashes: true, auth: null, host: '', port: null, hostname: '', hash: null, search: null, query: null, pathname: '/www.attacker.com/a/b', path: '/www.attacker.com/a/b', href: 'https:///www.attacker.com/a/b' } ``` But it is already documented that using Node.js url.parse can lead to security issues: https://nodejs.org/dist/latest-v16.x/docs/api/url.html#url_url_parse_urlstring_parsequerystring_slashesdenotehost `Use of the legacy url.parse() method is discouraged. Users should use the WHATWG URL API. Because the url.parse() method uses a lenient, non-standard algorithm for parsing URL strings, security issues can be introduced. Specifically, issues with host name spoofing and incorrect handling of usernames and passwords have been identified.` Here's Ruby, also showing that it is missing the `host` and `hostname`, with a similar behavior to the currently reported "buggy" python `urlparse()` one: ```sh irb(main):001:0> require 'uri' => false irb(main):002:0> uri = URI.parse('https:/www.attacker.com/a/b') => #<URI::HTTPS https:/www.attacker.com/a/b> irb(main):003:0> uri.host => nil irb(main):004:0> uri.hostname => nil irb(main):005:0> uri.scheme => "https" irb(main):006:0> uri.path => "/www.attacker.com/a/b" ``` That said, it seems that Ruby throws on other permutations of the bad URL, which python does not. For example: ``` irb(main):011:0> other_uri = URI.parse('https:/\/\/\www.attacker.com/a/b') Traceback (most recent call last): 8: from /usr/bin/irb:23:in `<main>' 7: from /usr/bin/irb:23:in `load' 6: from /Library/Ruby/Gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `<top (required)>' 5: from (irb):11 4: from (irb):11:in `rescue in irb_binding' 3: from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/common.rb:234:in `parse' 2: from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:73:in `parse' 1: from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:67:in `split' URI::InvalidURIError (bad URI(is not URI?): "https:/\\/\\/\\www.attacker.com/a/b") ``` Same for this other URI, which Ruby does not accept (unlike python, which does accept it and returns with a missing host and hostname properties as evident earlier in this report): ``` irb(main):012:0> other_uri = URI.parse('https:/\www.attacker.com/a/b') Traceback (most recent call last): 8: from /usr/bin/irb:23:in `<main>' 7: from /usr/bin/irb:23:in `load' 6: from /Library/Ruby/Gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `<top (required)>' 5: from (irb):12 4: from (irb):12:in `rescue in irb_binding' 3: from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/common.rb:234:in `parse' 2: from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:73:in `parse' 1: from /System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:67:in `split' URI::InvalidURIError (bad URI(is not URI?): "https:/\\www.attacker.com/a/b") ``` Let's look at PHP. PHP's parse_url() function behaves much like python, where it misses to identify the host property for all 3 examples provided in this report: ``` ❯ php -a Interactive shell php > var_dump(parse_url('https:/\www.attacker.com/a/b')); array(2) { ["scheme"]=> string(5) "https" ["path"]=> string(22) "/\www.attacker.com/a/b" } php > var_dump(parse_url('https:/www.attacker.com/a/b')); array(2) { ["scheme"]=> string(5) "https" ["path"]=> string(21) "/www.attacker.com/a/b" } php > var_dump(parse_url('https:/\/\/\www.attacker.com/a/b')); array(2) { ["scheme"]=> string(5) "https" ["path"]=> string(26) "/\/\/\www.attacker.com/a/b" } php > var_dump(parse_url('https://www.attacker.com/a/b')); array(3) { ["scheme"]=> string(5) "https" ["host"]=> string(16) "www.attacker.com" ["path"]=> string(4) "/a/b" } ``` The applicability of this vulnerability It seems that, there's no direct way of manipulating a python runtime into a severe impact simply by sending it a malformed URL. However, a userland logic implementation that bases its decision on the python urlparse() function may introduce a security vulnerability due to the unexpected returned values of the function. These vulnerabilities may manifest as an SSRF, Open Redirect and other types of vulnerabilities related to incorrectly trusting a URL. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue44744> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com