Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Valdis Klētnieks
On Sun, 17 Mar 2019 20:43:40 -0400, David Levine said:

> Note the "in base64-encoded data".  The characters in the footer are after
> the end of the base64-encoded data, per the use of "end" here:
>
>Special processing is performed if fewer than 24 bits are available
>at the end of the data being encoded.  A full encoding quantum is
>always completed at the end of a body.

>From the very next paragraph:

   Because it is used only for padding at the end of the data, the
   occurrence of any "=" characters may be taken as evidence that the
   end of the data has been reached (without truncation in transit).  No
   such assurance is possible, however, when the number of octets
   transmitted was a multiple of three and no "=" characters are
   present.

In other words, you only have 2/3 chance of detecting that you've hit the
intentional end of the input by looking for an '='.  If you're ignoring line
breaks and illegal characters, that means you've  got a 1 in 3 chance of
continuing to keep pulling in stuff and decoding stuff that wasn't intended to
be decoded.

> If we do want to handle this input, I don't think that we should make the
> base64 decoder more lenient.  Instead, I think that mhfixmsg should transform
> it such that the rest of nmh could handle it.

That still doesn't address the *real* question, which is what form that
transform should take.  Deciding whether the detection of an issue should
be in the bse64 decoder or elsewhere is bikeshedding compared to trying
to decide what semantics you want..

> I think that it is what I meant.  But I also suspect that I'm missing your 
> point.

The point is that accidentally glomming non-base64 data into the decoder is
almost certainly *not* what you meant by 'see all of the content'. 

Particularly if the data wasn't ASCII - if it was a base64 of a zip file that
got a bunch of basically random bytes appended to the end, you're
going to have a *really* hard time figuring out why it was corrupted.

Feel free to give this a try:

cd /tmp
echo ab > shortfile
gzip shortfile
(base64 shortfile.gz; echo "I think that it is what I meant.  But I also 
suspect that I'm missing your point") | tr -dc '[A-Za-z+/\\n]' | base64 -d > 
short2.gz
gunzip short2.gz

At *best*, "see all the content" mean you get handed a bunch of decoded bytes
that were never encoded, so you get random trash splatted out.  At worst, you
get difficult-to-diagnose data corruption.


-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread David Levine
Ken wrote:

> That message is a single text/plain part with a C-T-E of base64; I think
> by definition the whole message body is supposed to be considered base64
> data.

I think the message is invalid.  If we want to salvage what we can from it,
I'm all for it.  But that should be done carefully.

> And how do we know that those characters are AFTER the base64
> data?

For the purpose of interpreting RFC 2045, we do know in this case.

> It sure seems to me from the RFC that it is permissible to ignore
> characters that are not part of the base64 alphabet.

"in base64-encoded data"

> And really, I think
> we are the only MUA that errors out in this way; just on pure usability
> we aren't doing great.

I agree.  I think this is a job for mhfixmsg, not making the parser more
lenient.

David

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Ken Hornstein
>The non-base64 characters in the message body are after the end of the
>base64-encoded data.  They're not "in base64 data".

That message is a single text/plain part with a C-T-E of base64; I think
by definition the whole message body is supposed to be considered base64
data.  And how do we know that those characters are AFTER the base64
data?  Ok, fine, because we're humans and we understand that was added
by bad mailing list software, but how is software supposed to know that
the base64 content has ended in that message?

It sure seems to me from the RFC that it is permissible to ignore
characters that are not part of the base64 alphabet.  And really, I think
we are the only MUA that errors out in this way; just on pure usability
we aren't doing great.  I still think my original suggestion would
provide a reasonable compromise between usability and correctness.

--Ken

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread David Levine
Valdis wrote:

> > >My reading of RFC2045 says a conforming base64 decoder is allowed to toss 
> > >out
> > >the blanks and the '!' char and decode the rest.
> > >
> > >   Any characters outside of the base64 alphabet are to be ignored in
> > >   base64-encoded data.

Note the "in base64-encoded data".  The characters in the footer are after the 
end of the base64-encoded data, per the use of "end" here:

   Special processing is performed if fewer than 24 bits are available
   at the end of the data being encoded.  A full encoding quantum is
   always completed at the end of a body.

> There's this other related gem a few paragraphs earlier:
>
>The encoded output stream must be represented in lines of no more
>than 76 characters each.  All line breaks or other characters not
>found in Table 1 must be ignored by decoding software.  In base64
>data,

The non-base64 characters in the message body are after the end of the 
base64-encoded data.  They're not "in base64 data".

My interpretation RFC 2045 is that the message is invalid because the C-T-E 
doesn't specify the entire transformation the body was subjected to:

   This single Content-Transfer-Encoding token actually provides two
   pieces of information.  It specifies what sort of encoding
   transformation the body was subjected to and hence what decoding
   operation must be used to restore it to its original form, and it
   specifies what the domain of the result is.

If we do want to handle this input, I don't think that we should make the 
base64 decoder more lenient.  Instead, I think that mhfixmsg should transform 
it such that the rest of nmh could handle it.

# Oh, and what happens if a conforming implementation takes that
# 'you are receiving this message' whoopsie and decodes it?

If it detects an error while decoding it, then it should display a message and 
stop.  That's what happens now.

# This.  Which is probably *not* what you meant by 'see all of the content'.

I think that it is what I meant.  But I also suspect that I'm missing your 
point.

David

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Valdis Klētnieks
On Sun, 17 Mar 2019 17:29:16 -0400, Ken Hornstein said:
> >My reading of RFC2045 says a conforming base64 decoder is allowed to toss out
> >the blanks and the '!' char and decode the rest.
> >
> >   Any characters outside of the base64 alphabet are to be ignored in
> >   base64-encoded data.
> >
> >Yeah.  That's pretty definitive. :)
>
> Oh, hm, you know you learn something new every day, and this is my new
> thing for today.  As much as I've read RFC 2045 over the years, I missed
> this!  (This is in �6.8, in case others want to look it up).

There's this other related gem a few paragraphs earlier:

   The encoded output stream must be represented in lines of no more
   than 76 characters each.  All line breaks or other characters not
   found in Table 1 must be ignored by decoding software.  In base64
   data, characters other than those in Table 1, line breaks, and other
   white space probably indicate a transmission error, about which a
   warning message or even a message rejection might be appropriate
   under some circumstances.

I'm still trying to get even a lower-case 'must be ignored' to line up with
'a rejection might be appropriate'.. :)


-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Ken Hornstein
>I understand that the list is broken (and I've passed this on to the
>administrator). But my perspective is this: I've used nmh for eight
>years, and while I'm a big fan of the concept, and it has noticeably
>improved in usability in that time, it is still difficult. My camel's
>back is not broken yet, but it's beginning to buckle under the strain.
>I wish nmh would at least try to handle situations that other clients
>handle. Being told "that OTHER software is broken" repeatedly might be
>technically correct, but it doesn't help me when every mainstream mail
>client seems to be at least minimally usable in these situations.

Dude, I feel your pain.  I hope I communicated in my other email that
I think we should do better.  From a practical standpoint ... I try to
balance my desire to re-architecture the MIME code completely (which
would improve lots of things but take a long time) to fixing things in
the short term (which helps people now, but is just adding a Band-Aid
on top of a huge pile of Band-Aids).

--Ken

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Ken Hornstein
>My reading of RFC2045 says a conforming base64 decoder is allowed to toss out
>the blanks and the '!' char and decode the rest.
>
>   Any characters outside of the base64 alphabet are to be ignored in
>   base64-encoded data.
>
>Yeah.  That's pretty definitive. :)

Oh, hm, you know you learn something new every day, and this is my new
thing for today.  As much as I've read RFC 2045 over the years, I missed
this!  (This is in §6.8, in case others want to look it up).

So that suggests to me that we are in fact NOT being RFC-conforming with
this behavior, and we should just silently ignore the bad characters.
Does anyone disagree with this interpretation?

--Ken

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Valdis Klētnieks
On Sun, 17 Mar 2019 09:28:53 -0400, David Levine said:

> More generally, what if a sender (improperly) had annotated an already
> encoded message with, say, "DO NOT FORWARD THIS!"?  Bad, yes, but could lead 
> to
> undesired results if that was dropped.

My reading of RFC2045 says a conforming base64 decoder is allowed to toss out
the blanks and the '!' char and decode the rest.

   Any characters outside of the base64 alphabet are to be ignored in
   base64-encoded data.

Yeah.  That's pretty definitive. :)

Which means your sender just prepended the string '?NLSXCLr' to the message
(where the ? is an 'lower case a with circumflex')

Somehow, I doubt that's going to stop it from being forwarded.

> In other words, I'd like to see all of the content or an error message.

Oh, and what happens if a conforming implementation takes that
'you are receiving this message' whoopsie and decodes it?

This.  Which is probably *not* what you meant by 'see all of the content'.
-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread lambda
> In other words, I'd like to see all of the content or an error message.

I too like to be informed of errors instead of having the system guess
what I want and possibly be wrong with disastrous results.

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Anthony J. Bentley
Ralph Corderoy writes:
> David wrote:
> > In other words, I'd like to see all of the content or an error
> > message.
>
> This is the juncture where I normally take
> https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00#section-1
> out for a trot.

I understand that the list is broken (and I've passed this on to the
administrator). But my perspective is this: I've used nmh for eight
years, and while I'm a big fan of the concept, and it has noticeably
improved in usability in that time, it is still difficult. My camel's
back is not broken yet, but it's beginning to buckle under the strain.
I wish nmh would at least try to handle situations that other clients
handle. Being told "that OTHER software is broken" repeatedly might be
technically correct, but it doesn't help me when every mainstream mail
client seems to be at least minimally usable in these situations.

Valdis wrote:
> that maybe if we're looking at base64, if we encounter a blank line we
> toss the rest of the body part.

For what it's worth, this appears to be how GMail treats it.

-- 
Anthony J. Bentley

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Ken Hornstein
>"mhshow: invalid BASE64 encoding in --"

I'm also on a mailing list that has the same problem.  And yes, it is
totally invalid MIME due to the mailing list software appending a header
to the bottom of a base64-encoded part, as everyone else has mentioned.
And yes, that mailing list software should be fixed.

I am aware of Postel's maximum, and the arguments against it.  I am
actually in agreement that being strict is probably for the best in
protocol implementation, because it forces everyone to fix their broken
implementations.  But unfortunately that ignores the reality that we are
facing.

I believe all MUAs other than nmh handle this fine.  There isn't enough
momentum to force everyone to fix stuff like this that is broken.  So
for cases like this, I think we have to make some accompdations in the
name of usability.

I'm personally not interested in writing any code at this time to fix
this (when I finally get around to re-architecturing the MIME support,
then yes).  What I would PERSONALLY propose to fix this is that for the
specific case of text/plain objects (which are normally interpreted
directly by a human, as opposed to being handled by another program),
base64 decoding continues as long as possible, and if an error is
encountered then the error message is written out as part of the
text/plain content, and further base64 encoding is stopped.  For any OTHER
MIME type (including things like text/html), we generate an error and
abort as we do now.  I think this is reasonable behavior.  I'm open to
being persuaded otherwise.

--Ken

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Ralph Corderoy
Hi,

David wrote:
> In other words, I'd like to see all of the content or an error
> message.

This is the juncture where I normally take
https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00#section-1
out for a trot.

-- 
Cheers, Ralph.

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread David Levine
Valdis wrote:

> that maybe if we're looking at base64, if we encounter a blank line we toss 
> the
> rest of the body part.

That would work in this case, but the mailing list should be fixed.  More 
generally, what if a sender (improperly) had annotated an already encoded 
message with, say, "DO NOT FORWARD THIS!"?  Bad, yes, but could lead to 
undesired results if that was dropped.

In other words, I'd like to see all of the content or an error message.

David

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers