Martin v. Löwis <mar...@v.loewis.de> added the comment:

David: I think it's a little bit more complicated. RFC 2616 says that
the value of a header is *TEXT, which is defined as

   The TEXT rule is only used for descriptive field contents and values 
   that are not intended to be interpreted by the message parser. Words 
   of *TEXT MAY contain characters from character sets other than 
   ISO-8859-1 only when encoded according to the rules of RFC 2047

So I think send_header should change in the following way:

a) if isinstance(value, bytes): send value as-is
b) if value can be encoded in latin-1: encode in latin-1, then send as-is
c) otherwise: MIME-encode as UTF-8, using the following algorithm
   1. count the number of non-ascii characters, by encoding with
      ascii, ignore, and comparing result lengths
   2. if there are less than 10% non-ascii character, use the Q encoding
   3. otherwise, use the B encoding

The purpose of the algorithm in c) would be that text containing a few
non-latin characters still comes out right even if the receiver fails to
decode the header.

The same change would also apply to the client-side of sending headers.
On the receiving side, we should offer an option to decode headers (both
for client and server); this should be an option because senders may not
comply with RFC 2616. Reading should then proceed as follows:
1. check whether there are MIME markers in the text
2. if so, MIME-decode
3. if not, decode as latin-1

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue7606>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to