Barry Warsaw writes: > There are really two ways to look at an email message. It's either an > unstructured blob of bytes, or it's a structured tree of objects.
Indeed! > Those objects have headers and payload. The payload can be of any > type, though I think it generally breaks down into "strings" for text/ > * types and bytes for anything else (not counting multiparts). *sigh* Why are you back-tracking? The payload should be of an appropriate *object* type. Atomic object types will have their content stored as string or bytes [nb I use Python 3 terminology throughout]. Composite types (multipart/*) won't need string or bytes attributes AFAICS. Start by implementing the application/octet-stream and text/plain;charset=utf-8 object types, of course. > It does seem to make sense to think about headers as text header names > and text header values. I disagree. IMHO, structured header types should have object values, and something like message['to'] = "Barry 'da FLUFL' Warsaw <ba...@python.org>" should be smart enough to detect that it's a string and attempt to (flexibly) parse it into a fullname and a mailbox adding escapes, etc. Whether these should be structured objects or they can be strings or bytes, I'm not sure (probably bytes, not strings, though -- see next exampl). OTOH message['to'] = b'''"Barry 'da.FLUFL' Warsaw" <ba...@python.org>''' should assume that the client knows what they are doing, and should parse it strictly (and I mean "be a real bastard", eg, raise an exception on any non-ASCII octet), merely dividing it into fullname and mailbox, and caching the bytes for later insertion in a wire-format message. > In that case, I think you want the values as unicodes, and probably > the headers as unicodes containing only ASCII. So your table would be > strings in both cases. OTOH, maybe your application cares about the > raw underlying encoded data, in which case the header names are > probably still strings of ASCII-ish unicodes and the values are > bytes. It's this distinction (and I think the competing use cases) > that make a true Python 3.x API for email more complicated. I don't see why you can't have the email API be specific, with message['to'] always returning a structured_header object (or maybe even more specifically an address_header object), and methods like message['to'].build_header_as_text() which returns """To: "Barry 'da.FLUFL' Warsaw" <ba...@python.org>""" and message['to'].build_header_in_wire_format() which returns b"""To: "Barry 'da.FLUFL' Warsaw" <ba...@python.org>""" Then have email.textview.Message and email.wireview.Message which provide a simple interface where message['to'] would invoke .build_header_as_text() and .build_header_in_wire_format() respectively. > Thinking about this stuff makes me nostalgic for the sloppy happy days > of Python 2.x Er, yeah. Nostalgic-for-the-BITNET-days-where-everything-was-Just-EBCDIC-ly y'rs, _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com