On Tue, 12 Feb 2019 at 01:34, Sam Ruby <[email protected]> wrote: > > On Mon, Feb 11, 2019 at 5:12 PM sebb <[email protected]> wrote: > > > > On Mon, 11 Feb 2019 at 21:44, Sam Ruby <[email protected]> wrote: > > > > > > That is indeed a big in the mail gem. I contributed a patch that is a > > > partial fix. > > > > > > What is essential is that you only convert headers not the binary > > > attachment as that can corrupt images. > > > > AFAICT headers are already handled correctly even if they are LF-terminated. > > I assume this is because parsing looks for CR and/or LF - there's > > certainly some code that does this. > > > > This does not seem to be the case for body parsing. > > From what little I have seen, it looks as though the code tries to > > convert to CRLF before continuing to parse. > > > > If the body is not ascii_only then the method Mail::Utilities.to_crlf > > does not convert anything. > > I'm pretty sure that is what is happening with this email. > > > > Surely binary images have to be base64-encoded or similar before they > > can be sent in an email? > > > > AFAIK, CR and LF are not allowed in mail data - or have I got that wrong? > > For starters, if everybody correctly created emails according to the > spec, there would not be a need to provide any fixups. This also > explains why Craig resending the mail (with a proper email client that > respects the relevant RFCs) works around the problem. > > Older versions of the mail gem used to do a global conversion of LF > which was not preceded by a CR to CRLF (CRLF is required by the spec, > but again, that part of the spec is routinely ignored).. In response > to a bug report where that code was corrupting attachments, they > "fixed" the code to no longer do that except for the headers, but only > if the headers were pure ASCII. I provided a patch which expanded it > to headers that are correctly encoded, and that patch was accepted. > > Clearly this is not enough. > > What probably should be done is to split the file at the first > \r?\n\r?\n (a blank line separates the headers from the body). Then > do a global change of LFs which are not preceded by a CR to a CRLF - > but only in the headers. Then reattach the header and body and parse > that.
That is exactly what was previously done, but it did not work, because the problem is in parsing the body, not the headers. The headers parse fine even if LF-only terminated. Also, whilst CRLF is required by the spec, AFAICT all the mails that are processed by deliver.rb have LF-only. Just have a look at some of the members/board mail archives on Whimsy - these don't have any LF-processing added by tools/deliver.rb Ditto for mbox files on minotaur. Perhaps this is because CRLF is mandated for SMTP, but once the mail has been received, it's no longer a requirement? What I think needs to happen is for the mail gem to treat both CRLF and LF as EOL (possibly also CR only) when parsing the body. It already does this for the headers (as far as I can tell). There should be no need to 'fix' the input at all if it uses the appropriate EOL markers. However on output, of course it should always generate CRLF. > > S. > > - Sam Ruby > > > > On Mon, Feb 11, 2019, 4:11 PM sebb <[email protected] wrote: > > > > > > > It looks as though the mail gem does try to convert the input to CRLF > > > > before parsing. > > > > However, it only does this if the input is ASCII-only (*). > > > > > > > > Whilst this is the best approach, I think we can ignore that check and > > > > just convert to CRLF. > > > > [It may cause a very occasional glitch, but any such should be obvious.] > > > > > > > > It explains why the original email failed to parse: there are several > > > > non-ASCII chars in it. > > > > These have been converted to quoted-printable by the version you > > > > forwarded, thus allowing the parse to complete OK. > > > > > > > > I'll commit a fix shortly. > > > > > > > > S. > > > > (*) This applies to binary data, which is all that we know about the > > > > input at this point. > > > > > > > > On Mon, 11 Feb 2019 at 18:55, sebb <[email protected]> wrote: > > > > > > > > > > Seems to be an issue with the parser not handling certain messages > > > > > which have LF-only line terminators. > > > > > > > > > > I tried converting the mail file to CRLF and reparsing, and the > > > > > attachment appeared. > > > > > > > > > > AFAICT all messages are being stored as LF-only, and I don't know why > > > > > this only affects some mails. > > > > > The one in question is quite complicated with various quoted mails > > > > > before the attachment; that might have affected parsing. > > > > > > > > > > The copy of the mail sent to my GMail account appears to have CRLF > > > > > line terminators, yet the copies on minotaur and mbox-vm are LF-only. > > > > > Perhaps GMail auto-converts the mail? > > > > > Not sure what is happening here. > > > > > > > > > > On Mon, 11 Feb 2019 at 17:20, sebb <[email protected]> wrote: > > > > > > > > > > > > The immediate reason is that the parsed message does not have any > > > > > > attachments listed in the summary yml file, so does not show up in > > > > > > the > > > > > > workbench. > > > > > > > > > > > > I will look further into why the parsing did not see the attachment. > > > > > > > > > > > > S. > > > > > > On Mon, 11 Feb 2019 at 17:11, sebb <[email protected]> wrote: > > > > > > > > > > > > > > I'll take a look > > > > > > > > > > > > > > On Mon, 11 Feb 2019 at 16:27, Craig Russell <[email protected]> > > > > wrote: > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > This seems to be happening more frequently: an ICLA shows up in > > > > secretary mail inbox but does not appear in workbench. > > > > > > > > > > > > > > > > If I forward the message to secretary, it then appears in > > > > workbench. > > > > > > > > > > > > > > > > Without violating PII, I cannot forward the message directly. > > > > > > > > But > > > > this is the message that was received this AM and never made it to > > > > workbench: > > > > > > > > > > > > > > > > From: Olivier Coutu > > > > > > > > Date: 6:10 AM PST > > > > > > > > Subject: Fwd: Re: Patch to reduce lint time > > > > > > > > > > > > > > > > Any ideas? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Craig > > > > > > > > > > > > > > > > Craig L Russell > > > > > > > > Secretary, Apache Software Foundation > > > > > > > > [email protected] http://db.apache.org/jdo > > > > > > > > > > > >
