On Sat, Jul 07, 2012 at 02:41:41PM -0600, Jack M wrote:
> 
> I sometimes send messages that contain the lowercase 'o' with an umlaut
> over it, i.e., ö, unicode char 246.  I compose my messages in vim, with
> the encodings all set to utf-8.
> 
> Occasionally I can see that I message that I sent (e.g., to myself) has
> been unexepectedly encoded with quoted-printable, when I have the 'ö' in
> the message body.  Such messages also declare the charset as Latin1
> (which, I presume, was done by mutt, using $send_charset).
> 
> Somewhere along the line, the quoted-printable translation of 'ö' gets
> messed up.  Apparently, the raw text of such a message uses =C3=B6 to
> encode 'ö'. (In case this gets garbled, that's "equalsign, C3,
> equalsign, B6").  Mutt transparently decodes this (I guess) and shows me
> an umlauted 'o' in the pager.  What's funny is that =C3=B6 is not the
> correct QP-encoding for the umlauted 'o'; hence if I view the raw
> message text in vim (either by pressing 'e' from the pager or by saving
> to disk first), and then un-encode from QP, I get not one but two
> unicode characters, and both are incorrect.  (Namely, a capital A with a
> tilde on top, and some strange other thing).
> 
> As far as I can tell, the umlauted, lowercase 'o' is char 246 in both
> UTF8 and Latin1.  And as far as I can tell, the correct QP-translation
> of character 246 ought to be =F6 (equalsign, F6).  But the raw message
> text has *two* QP characters, =C3=B6, neither of which is correct.
> 
 Well, yes in unicode it is still decimal 246, but displaying that
in UTF-8 takes two bytes which happen to be 0xC3 0xB6.  I used to
know the (tedious) mechanics of how to decode UTF-8, but nowadays I
just google for the hex codes, "unicode C3B6" in this case.  The
details of the encoding are in the UTF-8 page at wikipedia.

 I know nothing about the details of quoted printable (apart from
what I've just read on wikipedia).  Certainly, that message isn't
latin1, it's UTF-8.  I suspect that the key is to find out *why*
that message has been sent as quoted printable latin1.  Certainly,
your post here is text/plain utf-8 and reads fine.

> So my questions are these:
> 
> 1) How is the QP encoding of a perfectly good UTF-8 text getting
> mangled?  Is mutt screwing it up when I ask for the raw message source?
> 
> 2) Given that it is mangled, how is it that mutt is nevertheless able to
> decode and display it properly?
> 
> I note that I'm on MacOSX 10.5, with $LANG as en_US.UTF-8 in
> Terminal.app.  I also note that I get the mangling whether I use console
> vim or the MacVim GUI.
> 
> -Jack
> 
 I'll also note that in the past vim has often managed to display
latin1 and UTF-8 together (in a UTF-8 term), which initially caused
me a lot of confusion trying to read some linux kernel patches that
changed latin1 (author names, mostly) in comments to UTF-8.

ĸen
-- 
das eine Mal als Tragödie, das andere Mal als Farce

Reply via email to