Steve Holden <st...@holdenweb.com> wrote: > rdmur...@bitdance.com wrote: > > Steve Holden <st...@holdenweb.com> wrote: > >>>>> from email.header import decode_header > >>>>> print > >> decode_header("=?us-ascii?Q?Inteum_C/SR_User_Tip:__Quick_Access_to_Recently_Opened_Inteu?=\r\n\t=?us-ascii?Q?m_C/SR_Records?=") > >> [('Inteum C/SR User Tip: Quick Access to Recently Opened Inteum C/SR > >> Records', 'us-ascii')] > > > > It is interesting that decode_header does what I would consider to be > > the right thing (from a pragmatic standpoint) with that particular bit > > of Microsoft not-quite-standards-compliant brain-damage; but, removing > > the tab is not in fact standards compliant if I'm reading the RFC > > correctly. > > > You'd need to quote me chapter and verse on that. I understood that the > tab simply indicated continuation, but it's a *long* time since I read > the RFCs.
Tab is not mentioned in RFC 2822 except to say that it is a valid whitespace character. Header folding (insertion of <cr><lf>) can occur most places whitespace appears, and is defined in section 2.2.3 thusly: Each header field is logically a single line of characters comprising the field name, the colon, and the field body. For convenience however, and to deal with the 998/78 character limitations per line, the field body portion of a header field can be split into a multiple line representation; this is called "folding". The general rule is that wherever this standard allows for folding white space (not simply WSP characters), a CRLF may be inserted before any WSP. For example, the header field: Subject: This is a test can be represented as: Subject: This is a test [irrelevant note elided] The process of moving from this folded multiple-line representation of a header field to its single line representation is called "unfolding". Unfolding is accomplished by simply removing any CRLF that is immediately followed by WSP. Each header field should be treated in its unfolded form for further syntactic and semantic evaluation. So, the whitespace characters are supposed to be left unchanged after unfolding. --David -- http://mail.python.org/mailman/listinfo/python-list