Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
I just added a -checkbase64 switch to mhfixmsg(1): The -checkbase64 switch enables a check of the encoding validity in base64-encoded MIME parts. The check looks for a non-encoded text footer appended to a base64-encoded part. Per RFC 2045 §6.8, the occurrence of a "=" character signifies the end of base-64 encoded content. If none is found, a heuristic is used: specifically, two consecutive invalid base64 characters signify the beginning of a plain text footer. If a text footer is found and this switch is enabled, mhfixmsg separates the base64-encoded and non-encoded content and places them in a pair of subparts to a newly constructed multipart/mixed part. That multipart/mixed part replaces the original base64-encoded part in the MIME structure of the message. It takes care of the particular issue reported by Anthony. It is enabled by default in mhfixmsg. I didn't modify the base64 decoder to conform to RFC 2045. David -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
I didn't know about the built-in iconv. But this behaviour seems a lot better than 'just drop the problems on the floor silently'. The reason I was interested in a louder error reporting was that, for a certain chunk of time if there was an encoding error, then chances were that the person who had broken things was me ... Laura -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
On Mon, 18 Mar 2019 21:10:45 -0400, Ken Hornstein said: > But the email you sent out was marked as having a character set of UTF-8 > with characters encoded as ISO-8859-1. Dude, I know you could do better > (also, I am puzzled as to how that happened; I think with nmh you'd have > to work to make that happen). Note that exmh is now over 47,000 lines of tk/tcl, of which 'git blame' says I'm the guilty party for 1,297. I may be the current maintainer, but that doesn't mean I wrote all of it. :) Well, the chapter symbol as it showed up in your mail was a 2-byte UTF 'C2A7', and what ended up in the outbound mail was only a Q-P encoded =A7, so the question is what ate the C2 and why. Testing indicates that when I do the reply, the file when it's in Mail/drafts/ it's got the 2-byte string in it, but by the time it ends up in the Fcc: folder it's lost the first byte and the second byte is QP-encoded. It seems to work fine if it ends up with: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit but the failing message had this instead: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable which at least gives me a place to start digging in more detail. Current theory is an off-by-one (Cleaning up the non-ascii support is on the to-do list, but now that I have a specific failure case to chase, it's time to get some caffeine and beat this bug into submission..) -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
Valdis wrote: > Deciding whether the detection of an issue should > be in the bse64 decoder or elsewhere is bikeshedding compared to trying > to decide what semantics you want.. Identifying whether the issue is due to invalid base64 characters or due to an improperly constructed MIME part is prerequisite to deciding those semantics. > The point is that accidentally glomming non-base64 data into the decoder The nmh base64 decoder does a good job of not getting tripped by that, and I'd like to keep it that way. David -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
>With irony, in a discussion about how to handle bad encodings in >mail, I found that I could not read this message by Valdis Klētnieks >https://lists.nongnu.org/archive/html/nmh-workers/2019-03/msg00023.html > >Something bad seems to have happened to his encoding of a '§'. Here are my thoughts about this. First . Valdis, really? You wrote a BITNET relay ... in Pascal, man. But the email you sent out was marked as having a character set of UTF-8 with characters encoded as ISO-8859-1. Dude, I know you could do better (also, I am puzzled as to how that happened; I think with nmh you'd have to work to make that happen). >Now, my .mh_profile says (all one line, but I made it more readable). >[...] You may not be aware, but nmh has had built-in iconv support for a while now; you're free to do whatever you want, but you might find it easier to use that. But anyway ... >When my mail blows up, I just pop into .mh_profile, add the -c flag, and >then find out what it was that Valdis wanted to tell us. Then I take it >out again so I can be informed when iconv next runs into problems. I hope you would understand that I would say this ... is not a desirable user interface. It might be the exact opposite of that, actually. >But the behaviour I want is one that iconv doesn't give you. Scream >informatively about the problem and then continue on as if it never >happened. I want the message that I get without the -c flag and then >the -c behaviour. Well, with the built-in iconv, we don't do that exactly. We do pretty much behave like every other MUA in this regard, though. When you use the built-in iconv if the input character cannot be converted into the target character set, it gets replaced with a substitution character (which is normally just a '?'). This has the advantage of mostly continuing on without problems. We don't scream loudly, but honestly I think that is lousy behavior (I am not aware of a MUA that does that, and I can't really think of reason why that behavior is desirable). --Ken -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
With irony, in a discussion about how to handle bad encodings in mail, I found that I could not read this message by Valdis Klētnieks https://lists.nongnu.org/archive/html/nmh-workers/2019-03/msg00023.html Something bad seems to have happened to his encoding of a '§'. Now, my .mh_profile says (all one line, but I made it more readable). mhshow-show-text: iconv -f "$(charset=$(echo %a | sed -n -r 's/.*charset="?([-a-zA-Z0-9_]*).*/\1/p'); if [ x$charset = xunicode-1-1-utf-7 ]; then echo utf-7; else echo ${charset:-iso-8859-1}; fi)" | less (I used to get lots of utf-7 mail. Haven't seen any for man years now.) and iconv is very picky. It says, quite correctly, iconv: illegal input sequence at position 507 and then stops. So my experience is seeing: >Date:Sun, 17 Mar 2019 18:12:49 -0400 >To: Ken Hornstein >cc: nmh-workers@nongnu.org >From:"Valdis Klētnieks" >Subject: Re: [nmh-workers] mhshow: invalid BASE64 encoding in -- > > >iconv: illegal input sequence at position 507On Sun, 17 Mar 2019 17:29:16 >-0400, Ken Hornstein said: > >> >My reading of RFC2045 says a conforming base64 decoder is allowed to toss >> >out >> >the blanks and the '!' char and decode the rest. >> > >> > Any characters outside of the base64 alphabet are to be ignored in >> > base64-encoded data. >> > >> >Yeah. That's pretty definitive. :) >> >> Oh, hm, you know you learn something new every day, and this is my new >> thing for today. As much as I've read RFC 2045 over the years, I missed >> this! (This is in -- >nmh-workers >https://lists.nongnu.org/mailman/listinfo/nmh-workers I can see how this might be behaviour you might want, but mostly I don't. You can give iconv the -c flag "Silently discard characters that cannot be converted instead of terminating when encountering such characters." But since it is silent, there is no way for me to know that it encountered a problem. But the behaviour I want is one that iconv doesn't give you. Scream informatively about the problem and then continue on as if it never happened. I want the message that I get without the -c flag and then the -c behaviour. When my mail blows up, I just pop into .mh_profile, add the -c flag, and then find out what it was that Valdis wanted to tell us. Then I take it out again so I can be informed when iconv next runs into problems. I just thought that while we were discussing what we should do, I would mention this because it is the middle ground that I want most of the time. Laura -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
On Sun, 17 Mar 2019 20:43:40 -0400, David Levine said: > Note the "in base64-encoded data". The characters in the footer are after > the end of the base64-encoded data, per the use of "end" here: > >Special processing is performed if fewer than 24 bits are available >at the end of the data being encoded. A full encoding quantum is >always completed at the end of a body. >From the very next paragraph: Because it is used only for padding at the end of the data, the occurrence of any "=" characters may be taken as evidence that the end of the data has been reached (without truncation in transit). No such assurance is possible, however, when the number of octets transmitted was a multiple of three and no "=" characters are present. In other words, you only have 2/3 chance of detecting that you've hit the intentional end of the input by looking for an '='. If you're ignoring line breaks and illegal characters, that means you've got a 1 in 3 chance of continuing to keep pulling in stuff and decoding stuff that wasn't intended to be decoded. > If we do want to handle this input, I don't think that we should make the > base64 decoder more lenient. Instead, I think that mhfixmsg should transform > it such that the rest of nmh could handle it. That still doesn't address the *real* question, which is what form that transform should take. Deciding whether the detection of an issue should be in the bse64 decoder or elsewhere is bikeshedding compared to trying to decide what semantics you want.. > I think that it is what I meant. But I also suspect that I'm missing your > point. The point is that accidentally glomming non-base64 data into the decoder is almost certainly *not* what you meant by 'see all of the content'. Particularly if the data wasn't ASCII - if it was a base64 of a zip file that got a bunch of basically random bytes appended to the end, you're going to have a *really* hard time figuring out why it was corrupted. Feel free to give this a try: cd /tmp echo ab > shortfile gzip shortfile (base64 shortfile.gz; echo "I think that it is what I meant. But I also suspect that I'm missing your point") | tr -dc '[A-Za-z+/\\n]' | base64 -d > short2.gz gunzip short2.gz At *best*, "see all the content" mean you get handed a bunch of decoded bytes that were never encoded, so you get random trash splatted out. At worst, you get difficult-to-diagnose data corruption. -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
Ken wrote: > That message is a single text/plain part with a C-T-E of base64; I think > by definition the whole message body is supposed to be considered base64 > data. I think the message is invalid. If we want to salvage what we can from it, I'm all for it. But that should be done carefully. > And how do we know that those characters are AFTER the base64 > data? For the purpose of interpreting RFC 2045, we do know in this case. > It sure seems to me from the RFC that it is permissible to ignore > characters that are not part of the base64 alphabet. "in base64-encoded data" > And really, I think > we are the only MUA that errors out in this way; just on pure usability > we aren't doing great. I agree. I think this is a job for mhfixmsg, not making the parser more lenient. David -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
>The non-base64 characters in the message body are after the end of the >base64-encoded data. They're not "in base64 data". That message is a single text/plain part with a C-T-E of base64; I think by definition the whole message body is supposed to be considered base64 data. And how do we know that those characters are AFTER the base64 data? Ok, fine, because we're humans and we understand that was added by bad mailing list software, but how is software supposed to know that the base64 content has ended in that message? It sure seems to me from the RFC that it is permissible to ignore characters that are not part of the base64 alphabet. And really, I think we are the only MUA that errors out in this way; just on pure usability we aren't doing great. I still think my original suggestion would provide a reasonable compromise between usability and correctness. --Ken -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
Valdis wrote: > > >My reading of RFC2045 says a conforming base64 decoder is allowed to toss > > >out > > >the blanks and the '!' char and decode the rest. > > > > > > Any characters outside of the base64 alphabet are to be ignored in > > > base64-encoded data. Note the "in base64-encoded data". The characters in the footer are after the end of the base64-encoded data, per the use of "end" here: Special processing is performed if fewer than 24 bits are available at the end of the data being encoded. A full encoding quantum is always completed at the end of a body. > There's this other related gem a few paragraphs earlier: > >The encoded output stream must be represented in lines of no more >than 76 characters each. All line breaks or other characters not >found in Table 1 must be ignored by decoding software. In base64 >data, The non-base64 characters in the message body are after the end of the base64-encoded data. They're not "in base64 data". My interpretation RFC 2045 is that the message is invalid because the C-T-E doesn't specify the entire transformation the body was subjected to: This single Content-Transfer-Encoding token actually provides two pieces of information. It specifies what sort of encoding transformation the body was subjected to and hence what decoding operation must be used to restore it to its original form, and it specifies what the domain of the result is. If we do want to handle this input, I don't think that we should make the base64 decoder more lenient. Instead, I think that mhfixmsg should transform it such that the rest of nmh could handle it. # Oh, and what happens if a conforming implementation takes that # 'you are receiving this message' whoopsie and decodes it? If it detects an error while decoding it, then it should display a message and stop. That's what happens now. # This. Which is probably *not* what you meant by 'see all of the content'. I think that it is what I meant. But I also suspect that I'm missing your point. David -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
On Sun, 17 Mar 2019 17:29:16 -0400, Ken Hornstein said: > >My reading of RFC2045 says a conforming base64 decoder is allowed to toss out > >the blanks and the '!' char and decode the rest. > > > > Any characters outside of the base64 alphabet are to be ignored in > > base64-encoded data. > > > >Yeah. That's pretty definitive. :) > > Oh, hm, you know you learn something new every day, and this is my new > thing for today. As much as I've read RFC 2045 over the years, I missed > this! (This is in �6.8, in case others want to look it up). There's this other related gem a few paragraphs earlier: The encoded output stream must be represented in lines of no more than 76 characters each. All line breaks or other characters not found in Table 1 must be ignored by decoding software. In base64 data, characters other than those in Table 1, line breaks, and other white space probably indicate a transmission error, about which a warning message or even a message rejection might be appropriate under some circumstances. I'm still trying to get even a lower-case 'must be ignored' to line up with 'a rejection might be appropriate'.. :) -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
>I understand that the list is broken (and I've passed this on to the >administrator). But my perspective is this: I've used nmh for eight >years, and while I'm a big fan of the concept, and it has noticeably >improved in usability in that time, it is still difficult. My camel's >back is not broken yet, but it's beginning to buckle under the strain. >I wish nmh would at least try to handle situations that other clients >handle. Being told "that OTHER software is broken" repeatedly might be >technically correct, but it doesn't help me when every mainstream mail >client seems to be at least minimally usable in these situations. Dude, I feel your pain. I hope I communicated in my other email that I think we should do better. From a practical standpoint ... I try to balance my desire to re-architecture the MIME code completely (which would improve lots of things but take a long time) to fixing things in the short term (which helps people now, but is just adding a Band-Aid on top of a huge pile of Band-Aids). --Ken -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
>My reading of RFC2045 says a conforming base64 decoder is allowed to toss out >the blanks and the '!' char and decode the rest. > > Any characters outside of the base64 alphabet are to be ignored in > base64-encoded data. > >Yeah. That's pretty definitive. :) Oh, hm, you know you learn something new every day, and this is my new thing for today. As much as I've read RFC 2045 over the years, I missed this! (This is in §6.8, in case others want to look it up). So that suggests to me that we are in fact NOT being RFC-conforming with this behavior, and we should just silently ignore the bad characters. Does anyone disagree with this interpretation? --Ken -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
On Sun, 17 Mar 2019 09:28:53 -0400, David Levine said: > More generally, what if a sender (improperly) had annotated an already > encoded message with, say, "DO NOT FORWARD THIS!"? Bad, yes, but could lead > to > undesired results if that was dropped. My reading of RFC2045 says a conforming base64 decoder is allowed to toss out the blanks and the '!' char and decode the rest. Any characters outside of the base64 alphabet are to be ignored in base64-encoded data. Yeah. That's pretty definitive. :) Which means your sender just prepended the string '?NLSXCLr' to the message (where the ? is an 'lower case a with circumflex') Somehow, I doubt that's going to stop it from being forwarded. > In other words, I'd like to see all of the content or an error message. Oh, and what happens if a conforming implementation takes that 'you are receiving this message' whoopsie and decodes it? This. Which is probably *not* what you meant by 'see all of the content'. -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
> In other words, I'd like to see all of the content or an error message. I too like to be informed of errors instead of having the system guess what I want and possibly be wrong with disastrous results. -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
Ralph Corderoy writes: > David wrote: > > In other words, I'd like to see all of the content or an error > > message. > > This is the juncture where I normally take > https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00#section-1 > out for a trot. I understand that the list is broken (and I've passed this on to the administrator). But my perspective is this: I've used nmh for eight years, and while I'm a big fan of the concept, and it has noticeably improved in usability in that time, it is still difficult. My camel's back is not broken yet, but it's beginning to buckle under the strain. I wish nmh would at least try to handle situations that other clients handle. Being told "that OTHER software is broken" repeatedly might be technically correct, but it doesn't help me when every mainstream mail client seems to be at least minimally usable in these situations. Valdis wrote: > that maybe if we're looking at base64, if we encounter a blank line we > toss the rest of the body part. For what it's worth, this appears to be how GMail treats it. -- Anthony J. Bentley -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
>"mhshow: invalid BASE64 encoding in --" I'm also on a mailing list that has the same problem. And yes, it is totally invalid MIME due to the mailing list software appending a header to the bottom of a base64-encoded part, as everyone else has mentioned. And yes, that mailing list software should be fixed. I am aware of Postel's maximum, and the arguments against it. I am actually in agreement that being strict is probably for the best in protocol implementation, because it forces everyone to fix their broken implementations. But unfortunately that ignores the reality that we are facing. I believe all MUAs other than nmh handle this fine. There isn't enough momentum to force everyone to fix stuff like this that is broken. So for cases like this, I think we have to make some accompdations in the name of usability. I'm personally not interested in writing any code at this time to fix this (when I finally get around to re-architecturing the MIME support, then yes). What I would PERSONALLY propose to fix this is that for the specific case of text/plain objects (which are normally interpreted directly by a human, as opposed to being handled by another program), base64 decoding continues as long as possible, and if an error is encountered then the error message is written out as part of the text/plain content, and further base64 encoding is stopped. For any OTHER MIME type (including things like text/html), we generate an error and abort as we do now. I think this is reasonable behavior. I'm open to being persuaded otherwise. --Ken -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
Hi, David wrote: > In other words, I'd like to see all of the content or an error > message. This is the juncture where I normally take https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00#section-1 out for a trot. -- Cheers, Ralph. -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
Valdis wrote: > that maybe if we're looking at base64, if we encounter a blank line we toss > the > rest of the body part. That would work in this case, but the mailing list should be fixed. More generally, what if a sender (improperly) had annotated an already encoded message with, say, "DO NOT FORWARD THIS!"? Bad, yes, but could lead to undesired results if that was dropped. In other words, I'd like to see all of the content or an error message. David -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [nmh-workers] mhshow: invalid BASE64 encoding in --
On Sat, 16 Mar 2019 22:14:41 -0600, "Anthony J. Bentley" said: > "mhshow: invalid BASE64 encoding in --" > > Since it's a public mailing list, one of these messages is enclosed below. > Content-Type: text/plain; charset=utf-8 > Content-Transfer-Encoding: base64 Yeah that's a reasonable thing to do if you're sending UTF-8/ > Message-ID: <6c5e3c76d90a7...@poolp.org> > > Ck9uIE1hciAxNiwgMjAxOSA1OjA5IFBNLCBUaG9tYXMgQm9obCA8b3BlbnNtdHBkLW1pc2MtNjQ2 So here we start the UTF-8 > bnN1YnNjcmliZSwgc2VuZCBhIG1haWwgdG86IG1pc2MrdW5zdWJzY3JpYmVAb3BlbnNtdHBkLm9y > Zwo+Cg== And here we finish it.. > > -- > You received this mail because you are subscribed to m...@opensmtpd.org > To unsubscribe, send a mail to: misc+unsubscr...@opensmtpd.org And this is a crock, because it's still part of the (only) bodypart, but is obviously not base64. What *should* happen if mailing list software feels the need to stick a footer on is to repackage the mail as a multipart/related, with the first body part being the UTF-8 body of the mail, and the second the mailing list's footer. Somebody should complain to the administrator of that list - mailman has been able to do this rewrite correctly for aeons now. Having said that, there's certainly plenty of room in "be lenient what you accept" that maybe if we're looking at base64, if we encounter a blank line we toss the rest of the body part. (I have an equivalent on my to-do list for exmh for this same issue.) -- nmh-workers https://lists.nongnu.org/mailman/listinfo/nmh-workers