On 02Feb2018 20:46, Yubin Ruan <ablacktsh...@gmail.com> wrote:
Personally I'm surprised that the Content-Transfer-Encoding isn't decoded
for saving things; it is there purely to get the original content through
the mail system and should be decoded. I need to do some testing.

There is a Content-Transfer-Encoding in the text/plain part, but it does not
seem to work as you described. Probably mutt have skipped that:

   ------=_Part_3237106_526482122.1517522433475
   Content-Type: text/plain;charset=UTF-8
   Content-Transfer-Encoding: quoted-printable
   Content-ID: text-body

   LinkedIn Highlights

   Should I tell coworkers my salary?

   264 people are talking about this

   https://www.linkedin.com/comm/search/results/content/?keywords=3DShould+I+t=
   ell+coworkers+my+salary%3F&origin=3DFED_EMAIL&anchorTopic=3D506458&midToken=
   =3DAQFSR-6AohV2qQ&trk=3Deml-email_feed_ecosystem_digest_01-hero-1-null&trkE=
   mail=3Deml-email_feed_ecosystem_digest_01-hero-1-null-null-6mvow3%7Ejd475j9=
   d%7Ey4-null-neptune%2Fsearch%2Eresults%2Econtent&lipi=3Durn%3Ali%3Apage%3Ae=
   mail_email_feed_ecosystem_digest_01%3BrxyHuw3NTWi4n8fHNW81ig%3D%3D

    =20
   -----------------------------------

Yubin, your sample messages seem to be from linkedin.com; is that the case?
If so, I've any number of them in my own inbox I can try.

Yes it is from linkedin.com. I have forwarded you a copy.

Thank you.

I've experimented with your message and I think I know what is happening.

If I save the text/plain part, I get what you wanted: the long line, and no quoted-printable encoding, like this:

   LinkedIn Highlights

   Should I tell coworkers my salary?

   264 people are talking about this

   
https://www.linkedin.com/comm/search/results/content/?keywords=Should+I+tell+coworkers+my+salary%3F&origin=FED_EMAIL&anchorTopic=506458&midToken=AQFSR-6AohV2qQ&trk=eml-email_feed_ecosystem_digest_01-hero-1-null&trkEmail=eml-email_feed_ecosystem_digest_01-hero-1-null-null-6mvow3%7Ejd475j9d%7Ey4-null-neptune%2Fsearch%2Eresults%2Econtent&lipi=urn%3Ali%3Apage%3Aemail_email_feed_ecosystem_digest_01%3BrxyHuw3NTWi4n8fHNW81ig%3D%3D

However, if I save the multipart/alternative part I get this:

   ------=_Part_3237106_526482122.1517522433475
   Content-Type: text/plain;charset=UTF-8
   Content-Transfer-Encoding: quoted-printable
   Content-ID: text-body

   LinkedIn Highlights

   Should I tell coworkers my salary?

   264 people are talking about this

   https://www.linkedin.com/comm/search/results/content/?keywords=3DShould+I+t=
   ell+coworkers+my+salary%3F&origin=3DFED_EMAIL&anchorTopic=3D506458&midToken=
   =3DAQFSR-6AohV2qQ&trk=3Deml-email_feed_ecosystem_digest_01-hero-1-null&trkE=
   mail=3Deml-email_feed_ecosystem_digest_01-hero-1-null-null-6mvow3%7Ejd475j9=
   d%7Ey4-null-neptune%2Fsearch%2Eresults%2Econtent&lipi=3Durn%3Ali%3Apage%3Ae=
   mail_email_feed_ecosystem_digest_01%3BrxyHuw3NTWi4n8fHNW81ig%3D%3D

    =20
    -----------------------------------

Which is what you saw. And that is correct behaviour in both cases. The text/plain gets saved in its original form. The multipart/alternative _also_ gets saved in its original form, but that form is a MIME body that itself contains, encoded, the text/plain _and_ the text/html parts.

So I believe you misaimed and saved the multipart attachment, not the individual text/plain attachment.

To elaborate, the multipart/alternative section is a well defined thing on its own. It is a container format for other message parts. As such is has a leading "----" MIME marker for the start of the first body part, then the optional headers indicating the type, encoding and id of the part, and then the part itself, encoded for storage in the multipart/alternative wrapper.

In your case, that looks like this:

   ------=_Part_3237106_526482122.1517522433475
   Content-Type: text/plain;charset=UTF-8
   Content-Transfer-Encoding: quoted-printable
   Content-ID: text-body

The "----....." part is a unique marker, guarrenteed not to exist in the _encoded_ body; this is reliable because it is chosen by the programme which made the multipart/alternative itself.

The content-type of the first part is text/plain; there will be the text/html afterwards.

The content-transfer-encoding is how the text/plain is encoded for storage within the multipart/alternative wrapper. It is quoted-printable, which presents a lot of the printable ASCII byte range unchanged, but encodes other values as "=XX", and folds the text nonsemanticly with "-\n".

The =XX encoding is to make the text survive passage through mail systems which support only a limited (or intersecting) range of characters, such as a non-8-bit clean system, or an EBCEDIC system, etc. The "=\n" line folding it to accomodate passage through mail systems which read the message as lines of text with a limited buffer length - the physical lines are thus no longer than a certain length.

The opening paragraph of the quoted-printable spec says:

   The Quoted-Printable encoding is intended to represent data that
   largely consists of octets that correspond to printable characters
   in the US-ASCII character set.  It encodes the data in such a way
   that the resulting octets are unlikely to be modified by mail
   transport.  If the data being encoded are mostly US-ASCII text, the
   encoded form of the data remains largely recognizable by humans.

You can read all of this in RFC2045, here:

 https://tools.ietf.org/rfcmarkup?rfc=2045

The quoted-printable encoding is part of that document, and specified here:

 https://tools.ietf.org/rfcmarkup?rfc=2045#section-6.7

The whole thing has a nice table of contents, and makes for a good read actually if you're interested.

Cheers,
Cameron Simpson <c...@cskk.id.au> (formerly c...@zip.com.au)

Reply via email to