John Machin <sjmac...@lexicon.net> wrote: > On Feb 25, 11:07=A0am, "Roy H. Han" <starsareblueandfara...@gmail.com> > wrote: > > Dear python-list, > > > > I'm having some trouble decoding an email header using the standard > > imaplib.IMAP4 class and email.message_from_string method. > > > > In particular, email.message_from_string() does not seem to properly > > decode unicode characters in the subject. > > > > How do I decode unicode characters in the subject? > > You don't. You can't. You decode str objects into unicode objects. You > encode unicode objects into str objects. If your input is not a str > object, you have a problem.
I can't speak for the OP, but I had a similar (and possibly identical-in-intent) question. Suppose you have a Subject line that looks like this: Subject: 'u' Obselete type =?ISO-8859-1?Q?--_it_is_identical_?= =?ISO-8859-1?Q?to_=27d=27=2E_=287=29?= How do you get the email module to decode that into unicode? The same question applies to the other header lines, and the answer is it isn't easy, and I had to read and reread the docs and experiment for a while to figure it out. I understand there's going to be a sprint on the email module at pycon, maybe some of this will get improved then. Here's the final version of my test program. The third to last line is one I thought ought to work given that Header has a __unicode__ method. The final line is the one that did work (note the kludge to turn None into 'ascii'...IMO 'ascii' is what deocde_header _should_ be returning, and this code shows why!) ------------------------------------------------------------------- from email import message_from_string from email.header import Header, decode_header x = message_from_string("""\ To: test Subject: 'u' Obselete type =?ISO-8859-1?Q?--_it_is_identical_?= =?ISO-8859-1?Q?to_=27d=27=2E_=287=29?= this is a test. """) print x print "--------------------" for key, header in x.items(): print key, 'type', type(header) print key+":", unicode(Header(header)).decode('utf-8') print key+":", decode_header(header) print key+":", ''.join([s.decode(t or 'ascii') for (s, t) in decode_header(header)]).encode('utf-8') ------------------------------------------------------------------- From nobody Wed Feb 25 08:35:29 2009 To: test Subject: 'u' Obselete type =?ISO-8859-1?Q?--_it_is_identical_?= =?ISO-8859-1?Q?to_=27d=27=2E_=287=29?= this is a test. -------------------- To type <type 'str'> To: test To: [('test', None)] To: test Subject type <type 'str'> Subject: 'u' Obselete type =?ISO-8859-1?Q?--_it_is_identical_?= =?ISO-8859-1?Q?to_=27d=27=2E_=287=29?= Subject: [("'u' Obselete type", None), ("-- it is identical to 'd'. (7)", 'iso-8859-1')] Subject: 'u' Obselete type-- it is identical to 'd'. (7) --RDM -- http://mail.python.org/mailman/listinfo/python-list