ready-research <readyresearch...@gmail.com> added the comment:

Some other examples to test this behaviour:
urlparse('https:/\/\/\www.attacker.com/a/b')
urlparse('https:/\www.attacker.com/a/b')

## Comparing it to other languages/runtimes

How do other languages and their runtimes work with URL parsing functions?

Here's Node.js, also showing that it is missing the `host` and `hostname`, with 
a similar behavior to the currently reported "buggy" python `urlparse()` one:
```
node
>require("url").parse("https:/\/\/\www.attacker.com/a/b");

Will return

Url {
  protocol: 'https:',
  slashes: true,
  auth: null,
  host: '',
  port: null,
  hostname: '',
  hash: null,
  search: null,
  query: null,
  pathname: '/www.attacker.com/a/b',
  path: '/www.attacker.com/a/b',
  href: 'https:///www.attacker.com/a/b'
}
```
But it is already documented that using Node.js url.parse can lead to security 
issues: 
https://nodejs.org/dist/latest-v16.x/docs/api/url.html#url_url_parse_urlstring_parsequerystring_slashesdenotehost
`Use of the legacy url.parse() method is discouraged. Users should use the 
WHATWG URL API. Because the url.parse() method uses a lenient, non-standard 
algorithm for parsing URL strings, security issues can be introduced. 
Specifically, issues with host name spoofing and incorrect handling of 
usernames and passwords have been identified.`


Here's Ruby, also showing that it is missing the `host` and `hostname`, with a 
similar behavior to the currently reported "buggy" python `urlparse()` one:

```sh
irb(main):001:0> require 'uri'
=> false
irb(main):002:0> uri = URI.parse('https:/www.attacker.com/a/b')
=> #<URI::HTTPS https:/www.attacker.com/a/b>
irb(main):003:0> uri.host
=> nil
irb(main):004:0> uri.hostname
=> nil
irb(main):005:0> uri.scheme
=> "https"
irb(main):006:0> uri.path
=> "/www.attacker.com/a/b"
```

That said, it seems that Ruby throws on other permutations of the bad URL, 
which python does not. For example:

```
irb(main):011:0> other_uri = URI.parse('https:/\/\/\www.attacker.com/a/b')
Traceback (most recent call last):
        8: from /usr/bin/irb:23:in `<main>'
        7: from /usr/bin/irb:23:in `load'
        6: from /Library/Ruby/Gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `<top 
(required)>'
        5: from (irb):11
        4: from (irb):11:in `rescue in irb_binding'
        3: from 
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/common.rb:234:in
 `parse'
        2: from 
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:73:in
 `parse'
        1: from 
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:67:in
 `split'
URI::InvalidURIError (bad URI(is not URI?): 
"https:/\\/\\/\\www.attacker.com/a/b")
```

Same for this other URI, which Ruby does not accept (unlike python, which does 
accept it and returns with a missing host and hostname properties as evident 
earlier in this report):

```
irb(main):012:0> other_uri = URI.parse('https:/\www.attacker.com/a/b')
Traceback (most recent call last):
        8: from /usr/bin/irb:23:in `<main>'
        7: from /usr/bin/irb:23:in `load'
        6: from /Library/Ruby/Gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `<top 
(required)>'
        5: from (irb):12
        4: from (irb):12:in `rescue in irb_binding'
        3: from 
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/common.rb:234:in
 `parse'
        2: from 
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:73:in
 `parse'
        1: from 
/System/Library/Frameworks/Ruby.framework/Versions/2.6/usr/lib/ruby/2.6.0/uri/rfc3986_parser.rb:67:in
 `split'
URI::InvalidURIError (bad URI(is not URI?): "https:/\\www.attacker.com/a/b")
```

Let's look at PHP. PHP's parse_url() function behaves much like python, where 
it misses to identify the host property for all 3 examples provided in this 
report:
```
❯ php -a
Interactive shell

php > var_dump(parse_url('https:/\www.attacker.com/a/b'));
array(2) {
  ["scheme"]=>
  string(5) "https"
  ["path"]=>
  string(22) "/\www.attacker.com/a/b"
}
php > var_dump(parse_url('https:/www.attacker.com/a/b'));
array(2) {
  ["scheme"]=>
  string(5) "https"
  ["path"]=>
  string(21) "/www.attacker.com/a/b"
}
php > var_dump(parse_url('https:/\/\/\www.attacker.com/a/b'));
array(2) {
  ["scheme"]=>
  string(5) "https"
  ["path"]=>
  string(26) "/\/\/\www.attacker.com/a/b"
}
php > var_dump(parse_url('https://www.attacker.com/a/b'));
array(3) {
  ["scheme"]=>
  string(5) "https"
  ["host"]=>
  string(16) "www.attacker.com"
  ["path"]=>
  string(4) "/a/b"
}
```

The applicability of this vulnerability
It seems that, there's no direct way of manipulating a python runtime into a 
severe impact simply by sending it a malformed URL.
However, a userland logic implementation that bases its decision on the python 
urlparse() function may introduce a security vulnerability due to the 
unexpected returned values of the function. These vulnerabilities may manifest 
as an SSRF, Open Redirect and other types of vulnerabilities related to 
incorrectly trusting a URL.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue44744>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to