Martin Panter added the comment:

For the record, this is what Requests sent when I passed a Latin-1-encodable 
string:

b'POST / HTTP/1.1\r\n'
b'Host: example.com\r\n'
b'Content-Length: 11\r\n'
b'Connection: keep-alive\r\n'
b'Accept: */*\r\n'
b'Accept-Encoding: gzip, deflate\r\n'
b'User-Agent: python-requests/2.9.1\r\n'
b'\r\n'
b'Celebrate \xa9'

There is no Content-Type header field, nor any indication of the encoding used. 
This is also how the lower-level HTTPConnection.request() method works.

The documentation already mentions that a text string gets encoded with 
ISO-8859-1 (a.k.a. Latin-1): 
<https://docs.python.org/3.3/library/http.client.html#http.client.HTTPConnection.request>.
 How do you propose to improve the error message?

Encoding with either Latin-1 or UTF-8 depending on the characters sounds like a 
terrible idea. We may as well send the request without any body and pretend 
everything is okay. I don’t understand the point of changing to UTF-8 either. 
If you actually want UTF-8 encoded text, why not explicitly encode it yourself?

Failing for any unencoded text string would be a serious backwards 
compatibility problem. It would break the POST example using urlencode() at 
<https://docs.python.org/3/library/http.client.html#examples> for instance.

IMO the Latin-1 encoding feature is a bad API design, maybe based on a 
misunderstanding of HTTP. Perhaps it would be more reasonable to deprecate the 
automatic Latin-1 encoding, and only allow ASCII characters in a text string. 
That would still cater for the urlencode() scenario in the POST example.

Of the links you posted, they seem to be different problems with separate 
solutions:

Requests bug 2838: Perhaps the user was trying to send URL-encoded form data. 
If so, textual fields should be UTF-8 encoded and then percent-encoded, 
resulting in only ASCII codes in the “data” argument. Python has 
urllib.parse.urlencode() which does this.

Requests bug 1822: It sounds like the user or a library intended to send UTF-8, 
so they should encode it themselves.

Stack Overflow: Custom web service needed fixing, and the user had to encode as 
UTF-8. This is a custom agreement between the client and server, it is not up 
to Python.

Ebay: I’m not familiar with any Ebay API and it is not clear from the post, but 
I suspect the user wasn’t encoding their data properly. Maybe similar to the 
first case.

For the rest it is not clear what the problem or solution was. Some of them 
sound like they were somehow sending text when they really wanted to send 
arbitrary bytes, in which case UTF-8 is not going to help.

----------
nosy: +martin.panter

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26045>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to