[issue27716] http.client truncates UTF-8 encoded headers

R. David Murray Tue, 09 Aug 2016 07:12:12 -0700

R. David Murray added the comment:

Well, email will happily parse bytes and treat the non-ascii data as opaque 
(though it does record errors in an internal data structure), but the python3 
http api expects the parsed headers to be strings when you access them, so 
you'd just hit the decoding problem at that point rather than earlier.


This is a hard problem. Since headers *can* be latin1 (I'd forgotten that) 
SMTPUTF8 won't work.  We are stuck against the problem that python makes a 
careful distinction between bytes and string, but http does not.

In theory we could pass bytes to email, and then provide a new API for getting 
at the "raw" (bytes) header so you can decode it however you want.  That runs 
into backward compatibility problems, though, since we currently do decode from 
latin-1 and many programs are probably relying on that.  

Throwing out an idea here: maybe having the http policy decode the parsed bytes 
header from latin-1 when headers are accessed through the normal API would 
preserve backward compatibility.  I'm not too worried about back-compat in the 
http policy, since it is provisional until 3.6 comes out and I doubt anyone is 
currently using it.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue27716>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27716] http.client truncates UTF-8 encoded headers

Reply via email to