[issue26045] Improve error message for http.client when posting unicode string

Martin Panter Thu, 07 Jan 2016 17:54:07 -0800

Martin Panter added the comment:

For the record, this is what Requests sent when I passed a Latin-1-encodable 
string:

b'POST / HTTP/1.1\r\n'
b'Host: example.com\r\n'
b'Content-Length: 11\r\n'
b'Connection: keep-alive\r\n'
b'Accept: */*\r\n'
b'Accept-Encoding: gzip, deflate\r\n'
b'User-Agent: python-requests/2.9.1\r\n'
b'\r\n'
b'Celebrate \xa9'

There is no Content-Type header field, nor any indication of the encoding used.
This is also how the lower-level HTTPConnection.request() method works.

The documentation already mentions that a text string gets encoded with
ISO-8859-1 (a.k.a. Latin-1):
<https://docs.python.org/3.3/library/http.client.html#http.client.HTTPConnection.request>.
How do you propose to improve the error message?

Encoding with either Latin-1 or UTF-8 depending on the characters sounds like a
terrible idea. We may as well send the request without any body and pretend
everything is okay. I don’t understand the point of changing to UTF-8 either.
If you actually want UTF-8 encoded text, why not explicitly encode it yourself?

Failing for any unencoded text string would be a serious backwards
compatibility problem. It would break the POST example using urlencode() at
<https://docs.python.org/3/library/http.client.html#examples> for instance.

IMO the Latin-1 encoding feature is a bad API design, maybe based on a
misunderstanding of HTTP. Perhaps it would be more reasonable to deprecate the
automatic Latin-1 encoding, and only allow ASCII characters in a text string.
That would still cater for the urlencode() scenario in the POST example.

Of the links you posted, they seem to be different problems with separate
solutions:

Requests bug 2838: Perhaps the user was trying to send URL-encoded form data.
If so, textual fields should be UTF-8 encoded and then percent-encoded,
resulting in only ASCII codes in the “data” argument. Python has
urllib.parse.urlencode() which does this.

Requests bug 1822: It sounds like the user or a library intended to send UTF-8,
so they should encode it themselves.

Stack Overflow: Custom web service needed fixing, and the user had to encode as
UTF-8. This is a custom agreement between the client and server, it is not up
to Python.

Ebay: I’m not familiar with any Ebay API and it is not clear from the post, but
I suspect the user wasn’t encoding their data properly. Maybe similar to the
first case.

For the rest it is not clear what the problem or solution was. Some of them
sound like they were somehow sending text when they really wanted to send
arbitrary bytes, in which case UTF-8 is not going to help.

----------
nosy: +martin.panter

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue26045>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue26045] Improve error message for http.client when posting unicode string

Reply via email to