Andrew Clover added the comment:

> Why only PATH_INFO is encoded in such a manner, but QUERY_STRING is passed 
> without any changes and does not requires any latin-1 to utf-8 recodings?

Laziness: QUERY_STRING should be pure-ASCII, making any such transcoding a 
no-op.

In principle a user agent *can* submit non-ASCII characters in a query string 
without %-encoding them, but it's not standards-conformant and most browsers 
don't usually do it (exception: apparently curl as above), so it's not worth 
adding a layer of hopefully-fixing-but-potentially-mangling to this variable to 
support a situation that shouldn't arise for normal requests.

PATH_INFO only requires special handling because of the sad, sad historical 
artefact of the CGI spec requiring it to have URL-decoding applied to it at the 
gateway, thus making the non-ASCII characters pop out of the percentage 
woodwork.

@Graham can you share more about how those test results were generated and 
displayed? The Gunicorn results are about what I would expect - the 
double-decoding of PATH_INFO is arguably undesirable when curl submits raw 
bytes, but ultimately that's an unspecified situation so I don't really case.

The output from Apache, on the other hand, is odd - something appears to have 
mangled the results at the reporting stage as not only is there double-decoding 
but also some double-backslashes. It looks like the strings have been put 
through ascii(repr()) or something?

----------
nosy: +Andrew Clover

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue16679>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to