[issue27716] http.client truncates UTF-8 encoded headers

Martin Panter Tue, 09 Aug 2016 19:28:23 -0700

Martin Panter added the comment:

For the test case given, the main problem is actually that a header field is 
being incorrectly split on a Latin-1 “next line” control code U+0085. The 
problem is already described under Issue 22233. It looks like I wrote a patch 
for that a while ago, so it would be good to revisit and see if it is worth 
applying.


Also, the problem would have been less severe if Issue 24363 was addressed; I 
proposed a patch at Issue 26686 which may help.

Here are the relevant header fields returned by the server:
>>> conn.request("GET", "/slownik/angielski-polski/")
>>> pprint(conn.sock.recv(3333).splitlines(keepends=True))
[b'HTTP/1.1 200 OK\r\n',
 . . .
 b'Link: <http://www.babla.cn/\xe8\x8b\xb1\xe8\xaf\xad-\xe6\xb3\xa2\xe5\x85\xb0'
 b'\xe8\xaf\xad/>; rel="alternate"; hreflang="zh-Hans", '
 . . .
 b'Transfer-Encoding: chunked\r\n',
 b'Content-Type: text/html;charset=UTF-8\r\n',
 b'\r\n',
 b'104c\r\n',
 b'<!DOCTYPE html>\n',
 . . .]

Regarding header value character encoding, revision cb09fdef19f5 is an example 
of where I assumed a Latin-1 transformation to handle non-ASCII redirect 
targets. Perhaps just document how the bytes are transformed, and how to get 
the original bytes back?

FWIW UTF-8 is used in RTSP, which is based on HTTP.

----------
nosy: +martin.panter
superseder:  -> http.client splits headers on non-\r\n characters

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue27716>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27716] http.client truncates UTF-8 encoded headers

Reply via email to