On Apr 9, 2009, at 11:41 PM, Tony Nelson wrote:

At 22:38 -0400 04/09/2009, Barry Warsaw wrote:
...
So, what I'm really asking is this.  Let's say you agree that there
are use cases for accessing a header value as either the raw encoded
bytes or the decoded unicode.  What should this return:

message['Subject']

The raw bytes or the decoded unicode?

That's an easy one:  Subject: is an unstructured header, so it must be
text, thus Unicode. We're looking at a high-level representation of an
email message, with parsed header fields and a MIME message tree.

I'm liking Glyph's suggestion here. We'll probably have to support the message['Subject'] API for backward compatibility, but in that case it really should be a bytes API.

(or better names... it's late and I'm tired ;).  One of those maps to
message['Subject'] but which is the more obvious choice?

Structured header fields are more of a problem. Any header with addresses should return a list of addresses. I think the default return type should depend on the data type. To get an explicit bytes or string or list of addresses, be explicit; otherwise, for convenience, return the appropriate
type for the particular header field name.

Yes, structured headers are trickier. In a separate message, James Knight makes some excellent points, which I agree with. However the email package obviously cannot support every time of structured header possible. It must support this through extensibility.

The obvious way is through inheritance (i.e. subclasses of Header), but in my experience, using inheritance of the Message class really doesn't work very well. You need to pass around factories to parsing functions and your application tends to have its own hierarchy of subclasses for whatever extra things it needs. ISTM that subclassing is simply not the right pattern to support extensibility in the Message objects or Header objects. Yes, this leads me to think that all the MIME* subclasses are essentially /wrong/.

Having said all that, the email package must support structured headers. Look at the insanity which is the current folding whitespace splitting and the impossibility of the current code to do the right thing for say Subject headers and Received headers, and you begin to see why it must be possible to extend this stuff.

Now, setting headers.  Sometimes you have some unicode thing and
sometimes you have some bytes.  You need to end up with bytes in the
ASCII range and you'd like to leave the header value unencoded if so.
But in both cases, you might have bytes or characters outside that
range, so you need an explicit encoding, defaulting to utf-8 probably.

Never for header fields. The default is always RFC 2047, unless it isn't,
say for params.

The Message class should create an object of the appropriate subclass of
Header based on the name (or use the existing object, see other
discussion), and that should inspect its argument and DTRT or complain.

Message.set_header('Subject', 'Some text', encoding='utf-8')
Message.set_header('Subject', b'Some bytes')

One of those maps to

message['Subject'] = ???

The expected data type should depend on the header field. For Subject:, it should be bytes to be parsed or verbatim text. For To:, it should be a
list of addresses or bytes or text to be parsed.

At a higher level, yes.  At the low level, it has to be bytes.

The email package should be pythonic, and not require deep understanding of dozens of RFCs to use properly. Users don't need to know about the raw bytes; that's the whole point of MIME and any email package. It should be easy to set header fields with their natural data types, and doing it with bad data should produce an error. This may require a bit more care in the
message parser, to always produce a parsed message with defects.

I agree that we should have some higher level APIs that make it easy to compose email messages, and probably easy-ish to parse a byte stream into an email message tree. But we can't build those without the lower level raw support. I'm also convinced that this lower level will be the domain of those crazy enough to have the RFCs tattooed to the back of their eyelids.

-Barry

Attachment: PGP.sig
Description: This is a digitally signed message part

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to