Re: quoted-unprintable, was BINARYMIME in Postfix
On 3/21/21 8:13 PM, John Levine wrote: > It appears that Wietse Venema said: >> With uniform or compressed payloads, 256 bytes become 261 on average, >> thus it takes 978.9 bytes on average to expand into 998. Add CR >> and LF to the 998, and we have an expansion of 1000/978.9=1.022 or >> just a little over 2%. > > That was my estimate too. I was rounding, so sue me. > >> It could have been a good idea 25 years ago. > > Turns out it came up on the ietf-smtp list in 2003. Here's the mail > discussion > and a strawman I-D that Ned Freed wrote for a deflate-8bit encoding that > combines > deflate compression (like gzip) with minimal escapes for 8BITMIME. > > https://mailarchive.ietf.org/arch/browse/ietf-822/?gbt=1&index=VmGPBP83tzuzAzdKOwtckalMipE > > https://datatracker.ietf.org/doc/draft-freed-mime-newenc/ > > I agree that these days we routinely pass around ummpteen megabyte base64 > messages and > nobody cares. If we did care, the reasonable approach would be to stick the > giant file > on a web server and use message/external-body to refer to it. That is > defined in > RFC 2017 which was indeed 25 years ago. Not an option, sadly. Good MUAs refuse to load external content for privacy reasons. Sincerely, Demi OpenPGP_signature Description: OpenPGP digital signature
Re: quoted-unprintable, was BINARYMIME in Postfix
John Levine: > It appears that Wietse Venema said: > >With uniform or compressed payloads, 256 bytes become 261 on average, > >thus it takes 978.9 bytes on average to expand into 998. Add CR > >and LF to the 998, and we have an expansion of 1000/978.9=1.022 or > >just a little over 2%. > > That was my estimate too. I was rounding, so sue me. I demonstrated that I am a worse sales person, when I pointed out that the expansion rate can range from 0.2% (when no quoting is needed) to over 100% (when every octet needs quoting). > >It could have been a good idea 25 years ago. > > Turns out it came up on the ietf-smtp list in 2003. Here's the > mail discussion Note that the quoting scheme came up in the context of compressed data, where I agree that the 2% expansion claim can be strong. With uncompressed data, YMMV. Thanks for the history lesson :-) Wietse
Re: quoted-unprintable, was BINARYMIME in Postfix
It appears that Wietse Venema said: >With uniform or compressed payloads, 256 bytes become 261 on average, >thus it takes 978.9 bytes on average to expand into 998. Add CR >and LF to the 998, and we have an expansion of 1000/978.9=1.022 or >just a little over 2%. That was my estimate too. I was rounding, so sue me. >It could have been a good idea 25 years ago. Turns out it came up on the ietf-smtp list in 2003. Here's the mail discussion and a strawman I-D that Ned Freed wrote for a deflate-8bit encoding that combines deflate compression (like gzip) with minimal escapes for 8BITMIME. https://mailarchive.ietf.org/arch/browse/ietf-822/?gbt=1&index=VmGPBP83tzuzAzdKOwtckalMipE https://datatracker.ietf.org/doc/draft-freed-mime-newenc/ I agree that these days we routinely pass around ummpteen megabyte base64 messages and nobody cares. If we did care, the reasonable approach would be to stick the giant file on a web server and use message/external-body to refer to it. That is defined in RFC 2017 which was indeed 25 years ago. R's, John
Re: quoted-unprintable, was BINARYMIME in Postfix
On Sun, Mar 21, 2021 at 04:38:56PM -0400, Wietse Venema wrote: > With non-uniform input, or with input from a smaller alphabet, I > expect that YMMV (the expansion can be less or more than 2%). For > example 1000 null bytes expand into 2000 (100%), and when content > requires no escaping, 998 bytes expand into 1000 (0.2%). Yes, one of the worst-cases would be UTF-16 or UCS2, where the Latin characters encode to a form with every other byte a NUL. This gives you a 50% blowup for ASCII. Even run-length encoding of consecutive NULs does not help. The nice thing about base64 is that the expansion is uniform and predictable. -- Viktor.
Re: quoted-unprintable, was BINARYMIME in Postfix
John Levine: > It appears that Wietse Venema said: > >> BINARYMIME avoids the 33% size increase of base64. If people cared > >> about that, since every MTA now supports 8BITMIME it would be easy > >> to invent a quoted-unprintable content-transfer-encoding which > >> escaped only the few characters that are special in 8BITMIME (CR > >> LF NUL and to be on the safe side, 0xff.) That would get you about > >> 98% of the way to binary with 2% of the work. > > > >This would turn binary content into a long line. That works perfectly > >with qmail and Postfix (except that the Postfix SMTP client will > >need a hint to avoid folding such lines at the 998 octet limit of > >RFC 5321). > > My quoted-unprintable would turn NUL CR LF \ xFF into \0 \r \n \\ \x. > The decoder ignores unescaped CR and LF. Just like with base64, insert > an unescaped CR LF after every 998 octets to make the lines the right > length. That still would put you within 2% of the size of pure binary. Sorry, I cannot resist. I'm reviewing conference papers right now, and I routinely sanity check numerical claims. With uniform or compressed payloads, 256 bytes become 261 on average, thus it takes 978.9 bytes on average to expand into 998. Add CR and LF to the 998, and we have an expansion of 1000/978.9=1.022 or just a little over 2%. With non-uniform input, or with input from a smaller alphabet, I expect that YMMV (the expansion can be less or more than 2%). For example 1000 null bytes expand into 2000 (100%), and when content requires no escaping, 998 bytes expand into 1000 (0.2%). It could have been a good idea 25 years ago. Wietse
Re: quoted-unprintable, was BINARYMIME in Postfix
It appears that Wietse Venema said: >> BINARYMIME avoids the 33% size increase of base64. If people cared >> about that, since every MTA now supports 8BITMIME it would be easy >> to invent a quoted-unprintable content-transfer-encoding which >> escaped only the few characters that are special in 8BITMIME (CR >> LF NUL and to be on the safe side, 0xff.) That would get you about >> 98% of the way to binary with 2% of the work. > >This would turn binary content into a long line. That works perfectly >with qmail and Postfix (except that the Postfix SMTP client will >need a hint to avoid folding such lines at the 998 octet limit of >RFC 5321). My quoted-unprintable would turn NUL CR LF \ xFF into \0 \r \n \\ \x. The decoder ignores unescaped CR and LF. Just like with base64, insert an unescaped CR LF after every 998 octets to make the lines the right length. That still would put you within 2% of the size of pure binary. R's, John