Martin v. Löwis <mar...@v.loewis.de> added the comment: David: I think it's a little bit more complicated. RFC 2616 says that the value of a header is *TEXT, which is defined as
The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO-8859-1 only when encoded according to the rules of RFC 2047 So I think send_header should change in the following way: a) if isinstance(value, bytes): send value as-is b) if value can be encoded in latin-1: encode in latin-1, then send as-is c) otherwise: MIME-encode as UTF-8, using the following algorithm 1. count the number of non-ascii characters, by encoding with ascii, ignore, and comparing result lengths 2. if there are less than 10% non-ascii character, use the Q encoding 3. otherwise, use the B encoding The purpose of the algorithm in c) would be that text containing a few non-latin characters still comes out right even if the receiver fails to decode the header. The same change would also apply to the client-side of sending headers. On the receiving side, we should offer an option to decode headers (both for client and server); this should be an option because senders may not comply with RFC 2616. Reading should then proceed as follows: 1. check whether there are MIME markers in the text 2. if so, MIME-decode 3. if not, decode as latin-1 ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue7606> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com