Mark Sapiro writes:

 > >> Content-Disposition: inline
 > >> Content-Type: text/plain; charset="us-ascii"
 > >> Content-Transfer-Encoding: 7bit
 > >>
 > >> And in the mbox file below (of the same message), I see:
 > >>
 > >> Content-Disposition: inline
 > >> Content-Type: text/plain
 > >> Content-Transfer-Encoding: quoted-printable
 > 
 > OK, that's the issue. The message in the .mbox is after various list
 > manipulations, but before scrubbing for the pipermail archive, and
 > somehow the '; charset="us-ascii"' has been lost from the Content-Type:
 > header, which is why Scrubber  scrubs it.

Does Scrubber really do that?  Per RFC, the two Content-Type fields
have exactly the same semantics: "it is plain text, encoded as ASCII."

I would hope instead that it's the non-ASCII content that triggered
something (are we sure it's Mailman? could be an MTA somewhere along
the line) to qp-encode.

For example, the original mail may have included directed quotes or
similar hard-to-distinguish "fancy punctuation", but the composing MUA
didn't notice them and just randomly set charset=us-ascii.  Is there
quoted-printable (easily recognized by the "=" + 2 hex digits syntax)
in that MIME body in the mbox?  Another possibility was that it was a
very long line and it was qp-encoded ("=" CRLF inserted after a space)
to conform to RFC 822.

 > > FWIW, using Thunderbird I posted the contents of the original email to a
 > > test list (on the same server, with the same lists configs) and as
 > > expected the archived message displays correctly as a plaintext
 > > email.

How did the contents get into the message in Thunderbird?  Copy-paste?
Yank from mailbox?  Forward?  If it's anything but the last,
Thuderbird almost surely massaged it on the way in.

 > > The headers this time show as:
 > > 
 > > === FROM EMAIL HEADER
 > > MIME-Version: 1.0
 > > Content-Type: text/plain; charset="us-ascii"
 > > Content-Transfer-Encoding: 7bit
 > > 
 > > === FROM MBOX
 > > MIME-Version: 1.0
 > > Content-Type: text/plain; charset=utf-8
 > > Content-Transfer-Encoding: 7bit
 > 
 > This is what should happen. I'm not sure which handler changed the
 > charset to utf-8, but I don't think that's significant.

This is RFC-ly bizarre.  Why would anything change the charset, unless
there were non-ASCII octets?  If something does make that change
despite the body being pure ASCII, I would argue that's a bug (very
old MUAs might refuse to display the message), although it's probably
irrelevant nowadays.

But if there *are* non-ASCII octets, the Content-Transfer-Encoding is
a lie.  I think the original message was probably broken as sent.

Steve

------------------------------------------------------
Mailman-Users mailing list -- mailman-users@python.org
To unsubscribe send an email to mailman-users-le...@python.org
https://mail.python.org/mailman3/lists/mailman-users.python.org/
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: https://www.mail-archive.com/mailman-users@python.org/
    https://mail.python.org/archives/list/mailman-users@python.org/

Reply via email to