On Fri, 08 Oct 2010 12:37:38 +0900, "Stephen J. Turnbull" <step...@xemacs.org> 
wrote:
> *If* you have an 8-bit value of unknown encoding on input, this will
> appear in the Header's value as a surrogate.  Hm, OK, I see the
> problem ... as usual, it's that the only efficient thing to do is
> encode using surrogate-escape which loses the information that these
> are invalid bytes.  Would it really be that bad to add an O(length)
> component where you examine the string for surrogates (and too-long
> words, for that matter), and chop off those pieces for MIME encoding?

Nope, and that's more or less what I think I'm going to do.  But I
haven't started writing the code yet.

>  > >  > Presumably you are suggesting that email5 be smart enough to turn my
>  > >  > example into properly UTF-8/CTE encoded text.
>  > > 
>  > > No, in general that's undecidable without asking the originator,
>  > > although humans can often make a good guess.
>  > 
>  > I was talking about unicode input, though, where you do know (modulo
>  > the language differences that unicode hasn't yet sorted out).
> 
> I don't understand why this is difficult.  As far as what Unicode has

It isn't difficult in principle.  It's just difficult in email5.

--
R. David Murray                                      www.bitdance.com
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to