Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-04-07 Thread David Levine
I just added a -checkbase64 switch to mhfixmsg(1):

  The -checkbase64 switch enables a check of the encoding
  validity in base64-encoded MIME parts.  The check looks for a
  non-encoded text footer appended to a base64-encoded part.
  Per RFC 2045 §6.8, the occurrence of a "=" character signifies
  the end of base-64 encoded content.  If none is found, a
  heuristic is used:  specifically, two consecutive invalid
  base64 characters signify the beginning of a plain text
  footer.  If a text footer is found and this switch is enabled,
  mhfixmsg separates the base64-encoded and non-encoded content
  and places them in a pair of subparts to a newly constructed
  multipart/mixed part.  That multipart/mixed part replaces the
  original base64-encoded part in the MIME structure of the
  message.

It takes care of the particular issue reported by Anthony.  It is enabled by 
default in mhfixmsg.

I didn't modify the base64 decoder to conform to RFC 2045.

David

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-20 Thread Laura Creighton
I didn't know about the built-in iconv.  But this behaviour seems a
lot better than 'just drop the problems on the floor silently'.  The
reason I was interested in a louder error reporting was that, for a
certain chunk of time if there was an encoding error, then chances
were that the person who had broken things was me ...

Laura


-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-18 Thread Valdis Klētnieks
On Mon, 18 Mar 2019 21:10:45 -0400, Ken Hornstein said:

> But the email you sent out was marked as having a character set of UTF-8
> with characters encoded as ISO-8859-1.  Dude, I know you could do better
> (also, I am puzzled as to how that happened; I think with nmh you'd have
> to work to make that happen).

Note that exmh is now over 47,000 lines of tk/tcl, of which 'git blame'
says I'm the guilty party for 1,297.  I may be the current maintainer, but
that doesn't mean I wrote all of it. :)

Well, the chapter symbol as it showed up in your mail was a 2-byte
UTF 'C2A7', and what ended up in the outbound mail was only a Q-P encoded
=A7, so the question is what ate the C2 and why.

Testing indicates that when I do the reply, the  file when it's in Mail/drafts/
it's got the 2-byte string in it, but by the time it ends up in the Fcc: folder
it's lost the first byte and the second byte is QP-encoded. 

It seems to work fine if it ends up with:

Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

but the failing message had this instead:

Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

which at least gives me a place to start digging in more detail. Current
theory is an off-by-one 

(Cleaning up the non-ascii support is on the to-do list, but now that
I have a specific failure case to chase, it's time to get some caffeine
and beat this bug into submission..)







-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-18 Thread David Levine
Valdis wrote:

> Deciding whether the detection of an issue should
> be in the bse64 decoder or elsewhere is bikeshedding compared to trying
> to decide what semantics you want..

Identifying whether the issue is due to invalid base64 characters or due to an 
improperly constructed MIME part is prerequisite to deciding those semantics.

> The point is that accidentally glomming non-base64 data into the decoder
The nmh base64 decoder does a good job of not getting tripped by that, and I'd 
like to keep it that way.

David

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-18 Thread Ken Hornstein
>With irony, in a discussion about how to handle bad encodings in
>mail, I found that I could not read this message by Valdis Klētnieks
>https://lists.nongnu.org/archive/html/nmh-workers/2019-03/msg00023.html
>
>Something bad seems to have happened to his encoding of a '§'.

Here are my thoughts about this.

First . Valdis, really?  You wrote a BITNET relay ... in Pascal, man.
But the email you sent out was marked as having a character set of UTF-8
with characters encoded as ISO-8859-1.  Dude, I know you could do better
(also, I am puzzled as to how that happened; I think with nmh you'd have
to work to make that happen).

>Now, my .mh_profile says (all one line, but I made it more readable).
>[...]

You may not be aware, but nmh has had built-in iconv support for a while
now; you're free to do whatever you want, but you might find it easier to
use that.  But anyway ...

>When my mail blows up, I just pop into .mh_profile, add the -c flag, and
>then find out what it was that Valdis wanted to tell us.  Then I take it
>out again so I can be informed when iconv next runs into problems.

I hope you would understand that I would say this ... is not a desirable
user interface.  It might be the exact opposite of that, actually.

>But the behaviour I want is one that iconv doesn't give you.  Scream
>informatively about the problem and then continue on as if it never
>happened.  I want the message that I get without the -c flag and then
>the -c behaviour.

Well, with the built-in iconv, we don't do that exactly.  We do pretty
much behave like every other MUA in this regard, though.  When you
use the built-in iconv if the input character cannot be converted
into the target character set, it gets replaced with a substitution
character (which is normally just a '?'). This has the advantage of
mostly continuing on without problems.  We don't scream loudly, but
honestly I think that is lousy behavior (I am not aware of a MUA that
does that, and I can't really think of reason why that behavior is
desirable).

--Ken

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-18 Thread Laura Creighton
With  irony, in a discussion about how to handle bad encodings in mail, I
found that I could not read this message by Valdis Klētnieks
https://lists.nongnu.org/archive/html/nmh-workers/2019-03/msg00023.html

Something bad seems to have happened to his encoding of a '§'.

Now, my .mh_profile says (all one line, but I made it more readable).

mhshow-show-text: iconv -f "$(charset=$(echo %a | 
sed -n -r 's/.*charset="?([-a-zA-Z0-9_]*).*/\1/p');
if [ x$charset = xunicode-1-1-utf-7 ]; then echo utf-7;
else echo ${charset:-iso-8859-1}; fi)" | less

(I used to get lots of utf-7 mail.  Haven't seen any for man years now.)

and iconv is very picky.  It says, quite correctly, 
iconv: illegal input sequence at position 507 and then stops.

So my experience is seeing:
>Date:Sun, 17 Mar 2019 18:12:49 -0400
>To:  Ken Hornstein 
>cc:  nmh-workers@nongnu.org
>From:"Valdis Klētnieks" 
>Subject: Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
>
>
>iconv: illegal input sequence at position 507On Sun, 17 Mar 2019 17:29:16 
>-0400, Ken Hornstein said:
>
>> >My reading of RFC2045 says a conforming base64 decoder is allowed to toss 
>> >out
>> >the blanks and the '!' char and decode the rest.
>> >
>> >   Any characters outside of the base64 alphabet are to be ignored in
>> >   base64-encoded data.
>> >
>> >Yeah.  That's pretty definitive. :)
>>
>> Oh, hm, you know you learn something new every day, and this is my new
>> thing for today.  As much as I've read RFC 2045 over the years, I missed
>> this!  (This is in -- 
>nmh-workers
>https://lists.nongnu.org/mailman/listinfo/nmh-workers

I can see how this might be behaviour you might want, but mostly I don't.


You can give iconv the -c flag "Silently discard characters that cannot be
converted instead  of terminating when
encountering such characters."

But since it is silent, there is no way for me to know that it encountered a
problem.  

But the behaviour I want is one that iconv doesn't give you.
Scream informatively about the problem and then continue on as if it 
never happened.  I want the message that I get without the -c flag and then
the -c behaviour.

When my mail blows up, I just pop into .mh_profile, add the -c flag, and then
find out what it was that Valdis wanted to tell us.  Then I take it out again
so I can be informed when iconv next runs into problems.

I just thought that while we were discussing what we should do, I would
mention this because it is the middle ground that I want most of the time.

Laura

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Valdis Klētnieks
On Sun, 17 Mar 2019 20:43:40 -0400, David Levine said:

> Note the "in base64-encoded data".  The characters in the footer are after
> the end of the base64-encoded data, per the use of "end" here:
>
>Special processing is performed if fewer than 24 bits are available
>at the end of the data being encoded.  A full encoding quantum is
>always completed at the end of a body.

>From the very next paragraph:

   Because it is used only for padding at the end of the data, the
   occurrence of any "=" characters may be taken as evidence that the
   end of the data has been reached (without truncation in transit).  No
   such assurance is possible, however, when the number of octets
   transmitted was a multiple of three and no "=" characters are
   present.

In other words, you only have 2/3 chance of detecting that you've hit the
intentional end of the input by looking for an '='.  If you're ignoring line
breaks and illegal characters, that means you've  got a 1 in 3 chance of
continuing to keep pulling in stuff and decoding stuff that wasn't intended to
be decoded.

> If we do want to handle this input, I don't think that we should make the
> base64 decoder more lenient.  Instead, I think that mhfixmsg should transform
> it such that the rest of nmh could handle it.

That still doesn't address the *real* question, which is what form that
transform should take.  Deciding whether the detection of an issue should
be in the bse64 decoder or elsewhere is bikeshedding compared to trying
to decide what semantics you want..

> I think that it is what I meant.  But I also suspect that I'm missing your 
> point.

The point is that accidentally glomming non-base64 data into the decoder is
almost certainly *not* what you meant by 'see all of the content'. 

Particularly if the data wasn't ASCII - if it was a base64 of a zip file that
got a bunch of basically random bytes appended to the end, you're
going to have a *really* hard time figuring out why it was corrupted.

Feel free to give this a try:

cd /tmp
echo ab > shortfile
gzip shortfile
(base64 shortfile.gz; echo "I think that it is what I meant.  But I also 
suspect that I'm missing your point") | tr -dc '[A-Za-z+/\\n]' | base64 -d > 
short2.gz
gunzip short2.gz

At *best*, "see all the content" mean you get handed a bunch of decoded bytes
that were never encoded, so you get random trash splatted out.  At worst, you
get difficult-to-diagnose data corruption.


-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread David Levine
Ken wrote:

> That message is a single text/plain part with a C-T-E of base64; I think
> by definition the whole message body is supposed to be considered base64
> data.

I think the message is invalid.  If we want to salvage what we can from it,
I'm all for it.  But that should be done carefully.

> And how do we know that those characters are AFTER the base64
> data?

For the purpose of interpreting RFC 2045, we do know in this case.

> It sure seems to me from the RFC that it is permissible to ignore
> characters that are not part of the base64 alphabet.

"in base64-encoded data"

> And really, I think
> we are the only MUA that errors out in this way; just on pure usability
> we aren't doing great.

I agree.  I think this is a job for mhfixmsg, not making the parser more
lenient.

David

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Ken Hornstein
>The non-base64 characters in the message body are after the end of the
>base64-encoded data.  They're not "in base64 data".

That message is a single text/plain part with a C-T-E of base64; I think
by definition the whole message body is supposed to be considered base64
data.  And how do we know that those characters are AFTER the base64
data?  Ok, fine, because we're humans and we understand that was added
by bad mailing list software, but how is software supposed to know that
the base64 content has ended in that message?

It sure seems to me from the RFC that it is permissible to ignore
characters that are not part of the base64 alphabet.  And really, I think
we are the only MUA that errors out in this way; just on pure usability
we aren't doing great.  I still think my original suggestion would
provide a reasonable compromise between usability and correctness.

--Ken

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread David Levine
Valdis wrote:

> > >My reading of RFC2045 says a conforming base64 decoder is allowed to toss 
> > >out
> > >the blanks and the '!' char and decode the rest.
> > >
> > >   Any characters outside of the base64 alphabet are to be ignored in
> > >   base64-encoded data.

Note the "in base64-encoded data".  The characters in the footer are after the 
end of the base64-encoded data, per the use of "end" here:

   Special processing is performed if fewer than 24 bits are available
   at the end of the data being encoded.  A full encoding quantum is
   always completed at the end of a body.

> There's this other related gem a few paragraphs earlier:
>
>The encoded output stream must be represented in lines of no more
>than 76 characters each.  All line breaks or other characters not
>found in Table 1 must be ignored by decoding software.  In base64
>data,

The non-base64 characters in the message body are after the end of the 
base64-encoded data.  They're not "in base64 data".

My interpretation RFC 2045 is that the message is invalid because the C-T-E 
doesn't specify the entire transformation the body was subjected to:

   This single Content-Transfer-Encoding token actually provides two
   pieces of information.  It specifies what sort of encoding
   transformation the body was subjected to and hence what decoding
   operation must be used to restore it to its original form, and it
   specifies what the domain of the result is.

If we do want to handle this input, I don't think that we should make the 
base64 decoder more lenient.  Instead, I think that mhfixmsg should transform 
it such that the rest of nmh could handle it.

# Oh, and what happens if a conforming implementation takes that
# 'you are receiving this message' whoopsie and decodes it?

If it detects an error while decoding it, then it should display a message and 
stop.  That's what happens now.

# This.  Which is probably *not* what you meant by 'see all of the content'.

I think that it is what I meant.  But I also suspect that I'm missing your 
point.

David

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Valdis Klētnieks
On Sun, 17 Mar 2019 17:29:16 -0400, Ken Hornstein said:
> >My reading of RFC2045 says a conforming base64 decoder is allowed to toss out
> >the blanks and the '!' char and decode the rest.
> >
> >   Any characters outside of the base64 alphabet are to be ignored in
> >   base64-encoded data.
> >
> >Yeah.  That's pretty definitive. :)
>
> Oh, hm, you know you learn something new every day, and this is my new
> thing for today.  As much as I've read RFC 2045 over the years, I missed
> this!  (This is in �6.8, in case others want to look it up).

There's this other related gem a few paragraphs earlier:

   The encoded output stream must be represented in lines of no more
   than 76 characters each.  All line breaks or other characters not
   found in Table 1 must be ignored by decoding software.  In base64
   data, characters other than those in Table 1, line breaks, and other
   white space probably indicate a transmission error, about which a
   warning message or even a message rejection might be appropriate
   under some circumstances.

I'm still trying to get even a lower-case 'must be ignored' to line up with
'a rejection might be appropriate'.. :)


-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Ken Hornstein
>I understand that the list is broken (and I've passed this on to the
>administrator). But my perspective is this: I've used nmh for eight
>years, and while I'm a big fan of the concept, and it has noticeably
>improved in usability in that time, it is still difficult. My camel's
>back is not broken yet, but it's beginning to buckle under the strain.
>I wish nmh would at least try to handle situations that other clients
>handle. Being told "that OTHER software is broken" repeatedly might be
>technically correct, but it doesn't help me when every mainstream mail
>client seems to be at least minimally usable in these situations.

Dude, I feel your pain.  I hope I communicated in my other email that
I think we should do better.  From a practical standpoint ... I try to
balance my desire to re-architecture the MIME code completely (which
would improve lots of things but take a long time) to fixing things in
the short term (which helps people now, but is just adding a Band-Aid
on top of a huge pile of Band-Aids).

--Ken

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Ken Hornstein
>My reading of RFC2045 says a conforming base64 decoder is allowed to toss out
>the blanks and the '!' char and decode the rest.
>
>   Any characters outside of the base64 alphabet are to be ignored in
>   base64-encoded data.
>
>Yeah.  That's pretty definitive. :)

Oh, hm, you know you learn something new every day, and this is my new
thing for today.  As much as I've read RFC 2045 over the years, I missed
this!  (This is in §6.8, in case others want to look it up).

So that suggests to me that we are in fact NOT being RFC-conforming with
this behavior, and we should just silently ignore the bad characters.
Does anyone disagree with this interpretation?

--Ken

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Valdis Klētnieks
On Sun, 17 Mar 2019 09:28:53 -0400, David Levine said:

> More generally, what if a sender (improperly) had annotated an already
> encoded message with, say, "DO NOT FORWARD THIS!"?  Bad, yes, but could lead 
> to
> undesired results if that was dropped.

My reading of RFC2045 says a conforming base64 decoder is allowed to toss out
the blanks and the '!' char and decode the rest.

   Any characters outside of the base64 alphabet are to be ignored in
   base64-encoded data.

Yeah.  That's pretty definitive. :)

Which means your sender just prepended the string '?NLSXCLr' to the message
(where the ? is an 'lower case a with circumflex')

Somehow, I doubt that's going to stop it from being forwarded.

> In other words, I'd like to see all of the content or an error message.

Oh, and what happens if a conforming implementation takes that
'you are receiving this message' whoopsie and decodes it?

This.  Which is probably *not* what you meant by 'see all of the content'.
-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers

Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread lambda
> In other words, I'd like to see all of the content or an error message.

I too like to be informed of errors instead of having the system guess
what I want and possibly be wrong with disastrous results.

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Anthony J. Bentley
Ralph Corderoy writes:
> David wrote:
> > In other words, I'd like to see all of the content or an error
> > message.
>
> This is the juncture where I normally take
> https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00#section-1
> out for a trot.

I understand that the list is broken (and I've passed this on to the
administrator). But my perspective is this: I've used nmh for eight
years, and while I'm a big fan of the concept, and it has noticeably
improved in usability in that time, it is still difficult. My camel's
back is not broken yet, but it's beginning to buckle under the strain.
I wish nmh would at least try to handle situations that other clients
handle. Being told "that OTHER software is broken" repeatedly might be
technically correct, but it doesn't help me when every mainstream mail
client seems to be at least minimally usable in these situations.

Valdis wrote:
> that maybe if we're looking at base64, if we encounter a blank line we
> toss the rest of the body part.

For what it's worth, this appears to be how GMail treats it.

-- 
Anthony J. Bentley

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Ken Hornstein
>"mhshow: invalid BASE64 encoding in --"

I'm also on a mailing list that has the same problem.  And yes, it is
totally invalid MIME due to the mailing list software appending a header
to the bottom of a base64-encoded part, as everyone else has mentioned.
And yes, that mailing list software should be fixed.

I am aware of Postel's maximum, and the arguments against it.  I am
actually in agreement that being strict is probably for the best in
protocol implementation, because it forces everyone to fix their broken
implementations.  But unfortunately that ignores the reality that we are
facing.

I believe all MUAs other than nmh handle this fine.  There isn't enough
momentum to force everyone to fix stuff like this that is broken.  So
for cases like this, I think we have to make some accompdations in the
name of usability.

I'm personally not interested in writing any code at this time to fix
this (when I finally get around to re-architecturing the MIME support,
then yes).  What I would PERSONALLY propose to fix this is that for the
specific case of text/plain objects (which are normally interpreted
directly by a human, as opposed to being handled by another program),
base64 decoding continues as long as possible, and if an error is
encountered then the error message is written out as part of the
text/plain content, and further base64 encoding is stopped.  For any OTHER
MIME type (including things like text/html), we generate an error and
abort as we do now.  I think this is reasonable behavior.  I'm open to
being persuaded otherwise.

--Ken

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread Ralph Corderoy
Hi,

David wrote:
> In other words, I'd like to see all of the content or an error
> message.

This is the juncture where I normally take
https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00#section-1
out for a trot.

-- 
Cheers, Ralph.

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-17 Thread David Levine
Valdis wrote:

> that maybe if we're looking at base64, if we encounter a blank line we toss 
> the
> rest of the body part.

That would work in this case, but the mailing list should be fixed.  More 
generally, what if a sender (improperly) had annotated an already encoded 
message with, say, "DO NOT FORWARD THIS!"?  Bad, yes, but could lead to 
undesired results if that was dropped.

In other words, I'd like to see all of the content or an error message.

David

-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [nmh-workers] mhshow: invalid BASE64 encoding in --

2019-03-16 Thread Valdis Klētnieks
On Sat, 16 Mar 2019 22:14:41 -0600, "Anthony J. Bentley" said:

> "mhshow: invalid BASE64 encoding in --"
>
> Since it's a public mailing list, one of these messages is enclosed below.

> Content-Type: text/plain; charset=utf-8
> Content-Transfer-Encoding: base64

Yeah that's a reasonable thing to do if you're sending UTF-8/

> Message-ID: <6c5e3c76d90a7...@poolp.org>
>
> Ck9uIE1hciAxNiwgMjAxOSA1OjA5IFBNLCBUaG9tYXMgQm9obCA8b3BlbnNtdHBkLW1pc2MtNjQ2

So here we start the UTF-8

> bnN1YnNjcmliZSwgc2VuZCBhIG1haWwgdG86IG1pc2MrdW5zdWJzY3JpYmVAb3BlbnNtdHBkLm9y
> Zwo+Cg==

And here we finish it..

>
> -- 
> You received this mail because you are subscribed to m...@opensmtpd.org
> To unsubscribe, send a mail to: misc+unsubscr...@opensmtpd.org

And this is a crock, because it's still part of the (only) bodypart, but is 
obviously not
base64.

What *should* happen if mailing list software feels the need to stick a footer 
on
is to repackage the mail as a multipart/related, with the first body part being
the UTF-8 body of the mail, and the second the mailing list's footer.

Somebody should complain to the administrator of that list - mailman has been
able to do this rewrite correctly for aeons now.

Having said that, there's certainly plenty of room in "be lenient what you 
accept"
that maybe if we're looking at base64, if we encounter a blank line we toss the
rest of the body part.

(I have an equivalent on my to-do list for exmh for this same issue.)



-- 
nmh-workers
https://lists.nongnu.org/mailman/listinfo/nmh-workers