Re: quoted-unprintable, was BINARYMIME in Postfix

2021-03-29 Thread Demi Marie Obenour
On 3/21/21 8:13 PM, John Levine wrote:
> It appears that Wietse Venema  said:
>> With uniform or compressed payloads, 256 bytes become 261 on average,
>> thus it takes 978.9 bytes on average to expand into 998.  Add CR
>> and LF to the 998, and we have an expansion of 1000/978.9=1.022 or
>> just a little over 2%.
> 
> That was my estimate too.  I was rounding, so sue me.
> 
>> It could have been a good idea 25 years ago.
> 
> Turns out it came up on the ietf-smtp list in 2003.  Here's the mail 
> discussion
> and a strawman I-D that Ned Freed wrote for a deflate-8bit encoding that 
> combines
> deflate compression (like gzip) with minimal escapes for 8BITMIME.
> 
> https://mailarchive.ietf.org/arch/browse/ietf-822/?gbt=1=VmGPBP83tzuzAzdKOwtckalMipE
> 
> https://datatracker.ietf.org/doc/draft-freed-mime-newenc/
> 
> I agree that these days we routinely pass around ummpteen megabyte base64 
> messages and
> nobody cares.  If we did care, the reasonable approach would be to stick the 
> giant file
> on a web server and use message/external-body to refer to it.  That is 
> defined in
> RFC 2017 which was indeed 25 years ago.

Not an option, sadly.  Good MUAs refuse to load external content for
privacy reasons.

Sincerely,

Demi




OpenPGP_signature
Description: OpenPGP digital signature


Re: quoted-unprintable, was BINARYMIME in Postfix

2021-03-22 Thread Wietse Venema
John Levine:
> It appears that Wietse Venema  said:
> >With uniform or compressed payloads, 256 bytes become 261 on average,
> >thus it takes 978.9 bytes on average to expand into 998.  Add CR
> >and LF to the 998, and we have an expansion of 1000/978.9=1.022 or
> >just a little over 2%.
> 
> That was my estimate too.  I was rounding, so sue me.

I demonstrated that I am a worse sales person, when I pointed out
that the expansion rate can range from 0.2% (when no quoting is
needed) to over 100% (when every octet needs quoting).

> >It could have been a good idea 25 years ago.
>
> Turns out it came up on the ietf-smtp list in 2003.  Here's the
> mail discussion

Note that the quoting scheme came up in the context of compressed
data, where I agree that the 2% expansion claim can be strong.
With uncompressed data, YMMV.

Thanks for the history lesson :-)

Wietse


Re: quoted-unprintable, was BINARYMIME in Postfix

2021-03-21 Thread John Levine
It appears that Wietse Venema  said:
>With uniform or compressed payloads, 256 bytes become 261 on average,
>thus it takes 978.9 bytes on average to expand into 998.  Add CR
>and LF to the 998, and we have an expansion of 1000/978.9=1.022 or
>just a little over 2%.

That was my estimate too.  I was rounding, so sue me.

>It could have been a good idea 25 years ago.

Turns out it came up on the ietf-smtp list in 2003.  Here's the mail discussion
and a strawman I-D that Ned Freed wrote for a deflate-8bit encoding that 
combines
deflate compression (like gzip) with minimal escapes for 8BITMIME.

https://mailarchive.ietf.org/arch/browse/ietf-822/?gbt=1=VmGPBP83tzuzAzdKOwtckalMipE

https://datatracker.ietf.org/doc/draft-freed-mime-newenc/

I agree that these days we routinely pass around ummpteen megabyte base64 
messages and
nobody cares.  If we did care, the reasonable approach would be to stick the 
giant file
on a web server and use message/external-body to refer to it.  That is defined 
in
RFC 2017 which was indeed 25 years ago.

R's,
John


Re: quoted-unprintable, was BINARYMIME in Postfix

2021-03-21 Thread Viktor Dukhovni
On Sun, Mar 21, 2021 at 04:38:56PM -0400, Wietse Venema wrote:

> With non-uniform input, or with input from a smaller alphabet, I
> expect that YMMV (the expansion can be less or more than 2%). For
> example 1000 null bytes expand into 2000 (100%), and when content
> requires no escaping, 998 bytes expand into 1000 (0.2%).

Yes, one of the worst-cases would be UTF-16 or UCS2, where the Latin
characters encode to a form with every other byte a NUL.  This gives you
a 50% blowup for ASCII.  Even run-length encoding of consecutive NULs
does not help.  The nice thing about base64 is that the expansion is
uniform and predictable.

-- 
Viktor.


Re: quoted-unprintable, was BINARYMIME in Postfix

2021-03-21 Thread Wietse Venema
John Levine:
> It appears that Wietse Venema  said:
> >> BINARYMIME avoids the 33% size increase of base64.  If people cared
> >> about that, since every MTA now supports 8BITMIME it would be easy
> >> to invent a quoted-unprintable content-transfer-encoding which
> >> escaped only the few characters that are special in 8BITMIME (CR
> >> LF NUL and to be on the safe side, 0xff.)  That would get you about
> >> 98% of the way to binary with 2% of the work.
> >
> >This would turn binary content into a long line. That works perfectly
> >with qmail and Postfix (except that the Postfix SMTP client will
> >need a hint to avoid folding such lines at the 998 octet limit of
> >RFC 5321).
> 
> My quoted-unprintable would turn NUL CR LF \ xFF into \0 \r \n \\ \x.
> The decoder ignores unescaped CR and LF. Just like with base64, insert
> an unescaped CR LF after every 998 octets to make the lines the right
> length. That still would put you within 2% of the size of pure binary.

Sorry, I cannot resist. I'm reviewing conference papers right
now, and I routinely sanity check numerical claims.

With uniform or compressed payloads, 256 bytes become 261 on average,
thus it takes 978.9 bytes on average to expand into 998.  Add CR
and LF to the 998, and we have an expansion of 1000/978.9=1.022 or
just a little over 2%.

With non-uniform input, or with input from a smaller alphabet, I
expect that YMMV (the expansion can be less or more than 2%). For
example 1000 null bytes expand into 2000 (100%), and when content
requires no escaping, 998 bytes expand into 1000 (0.2%).

It could have been a good idea 25 years ago.

Wietse


Re: quoted-unprintable, was BINARYMIME in Postfix

2021-03-21 Thread John Levine
It appears that Wietse Venema  said:
>> BINARYMIME avoids the 33% size increase of base64.  If people cared
>> about that, since every MTA now supports 8BITMIME it would be easy
>> to invent a quoted-unprintable content-transfer-encoding which
>> escaped only the few characters that are special in 8BITMIME (CR
>> LF NUL and to be on the safe side, 0xff.)  That would get you about
>> 98% of the way to binary with 2% of the work.
>
>This would turn binary content into a long line. That works perfectly
>with qmail and Postfix (except that the Postfix SMTP client will
>need a hint to avoid folding such lines at the 998 octet limit of
>RFC 5321).

My quoted-unprintable would turn NUL CR LF \ xFF into \0 \r \n \\ \x.
The decoder ignores unescaped CR and LF. Just like with base64, insert
an unescaped CR LF after every 998 octets to make the lines the right
length. That still would put you within 2% of the size of pure binary.

R's,
John