On Sat, 31 Aug 2013 20:37:30 +1000, Steven D'Aprano <st...@pearwood.info> wrote:
> On 31/08/13 15:21, R. David Murray wrote:
> > If you've read my blog (eg: on planet python), you will be aware that
> > I dedicated August to full time email package development.
> [...]
> 
> 
> The API looks really nice! Thank you for putting this together.

Thanks.

> A question comes to mind though:
> 
> > All input strings are unicode, and the library takes care of doing
> > whatever encoding is required.  When you pull data out of a parsed
> > message, you get unicode, without having to worry about how to decode
> > it yourself.
> 
> How well does your library cope with emails where the encoding is declared 
> wrongly? Or no encoding declared at all?

It copes as best it can :)  The bad bytes are preserved (unless you
modify a part) but are returned as the "unknown character" in a
string context.  You can get the original bytes out by using the
bytes access interface.  (There are probably some places where how
to do that isn't clear in the current API, but bascially either
you use BytesGenerator or you drop down to a lower level API.)

An attempt is made to interpret "bad bytes" as utf-8, before giving up
and replacing them with the 'unknown character' character.  I'm not 100%
sure that is a good idea.

> Conveniently, your email is an example of this. Although it contains 
> non-ASCII characters, it is declared as us-ascii:

Oh, yeah, my MUA is a little quirky and I forgot the step that
would have made that correct.  Wanting to rewrite it is one of
the reasons I embarked on this whole email thing a few years
ago :)

--David
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to