[issue42967] [security] urllib.parse.parse_qsl(): Web cache poisoning - `; ` as a query args separator
Adam Goldschmidt added the comment: > The difference is that semicolon is defined in a previous specification. I understand, but this will limit us in the future if the spec changes - though I don't have strong feelings regarding this one. > Dear all, now that Adam has signed the CLA, I have closed my PR in favor of > Adam's because I think 2 open PRs might split everyone's attention. Instead, > I'll focus on reviewing Adam's PR. Sorry for any inconvenience caused. ❤ -- ___ Python tracker <https://bugs.python.org/issue42967> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42967] [security] urllib.parse.parse_qsl(): Web cache poisoning - `; ` as a query args separator
Adam Goldschmidt added the comment: > That doesn’t feel necessary to me. I suspect most links use &, some use ;, > nothing else is valid at the moment and I don’t expect a new separator to > suddenly appear. IMO the boolean parameter to also recognize ; was better. That's reasonable. However, I think that we are making this change in order to treat the semicolon as a "custom" separator. In that case, why not let the developer decide on a different custom separator for their own use cases? What's the difference between a semicolon and something else? -- ___ Python tracker <https://bugs.python.org/issue42967> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42967] [security] urllib.parse.parse_qsl(): Web cache poisoning - `; ` as a query args separator
Adam Goldschmidt added the comment: > I _didn't_ change the default - it will allow both '&' and ';' still. Eric > showed a link above that still uses semicolon. So I feel that it's strange to > break backwards compatibility in a patch update. Maybe we can make just '&' > the default in Python 3.10, while backporting the ability to specify > separators to older versions so it's up to users? I like this implementation. I definitely think we should not break backwards compatibility and only change the default in Python 3.10. -- ___ Python tracker <https://bugs.python.org/issue42967> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42967] [security] urllib.parse.parse_qsl(): Web cache poisoning - `; ` as a query args separator
Adam Goldschmidt added the comment: I haven't noticed, I'm sorry. I don't mind closing mine, just thought it could be a nice first contribution. Our PRs are different though - I feel like if we are to implement this, we should let the developer choose the separator and not limit to just `&` and `;` - but that discussion probably belongs in the PR. -- ___ Python tracker <https://bugs.python.org/issue42967> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42967] [security] urllib.parse.parse_qsl(): Web cache poisoning - `; ` as a query args separator
Change by Adam Goldschmidt : -- pull_requests: +23120 pull_request: https://github.com/python/cpython/pull/24297 ___ Python tracker <https://bugs.python.org/issue42967> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue42967] Web cache poisoning - `;` as a query args separator
New submission from Adam Goldschmidt : The urlparse module treats semicolon as a separator (https://github.com/python/cpython/blob/master/Lib/urllib/parse.py#L739) - whereas most proxies today only take ampersands as separators. Link to a blog post explaining this vulnerability: https://snyk.io/blog/cache-poisoning-in-popular-open-source-packages/ When the attacker can separate query parameters using a semicolon (;), they can cause a difference in the interpretation of the request between the proxy (running with default configuration) and the server. This can result in malicious requests being cached as completely safe ones, as the proxy would usually not see the semicolon as a separator, and therefore would not include it in a cache key of an unkeyed parameter - such as `utm_*` parameters, which are usually unkeyed. Let’s take the following example of a malicious request: ``` GET /?link=http://google.com_content=1;link='>alert(1) HTTP/1.1 Host: somesite.com Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,imag e/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.9 Connection: close ``` urlparse sees 3 parameters here: `link`, `utm_content` and then `link` again. On the other hand, the proxy considers this full string: `1;link='>alert(1)` as the value of `utm_content`, which is why the cache key would only contain `somesite.com/?link=http://google.com`. A possible solution could be to allow developers to specify a separator, like werkzeug does: https://github.com/pallets/werkzeug/blob/6784c44673d25c91613c6bf2e614c84465ad135b/src/werkzeug/urls.py#L833 -- components: C API messages: 385266 nosy: AdamGold priority: normal severity: normal status: open title: Web cache poisoning - `;` as a query args separator type: security versions: Python 3.10, Python 3.6, Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue42967> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com