This is where the body parsing fails:

https://github.com/mikel/mail/blob/7c43c84c16f017e0ff5e5c9962f6a1d842301ee3/lib/mail/body.rb#L298

If the \r\n is changed to \r?\n then it works for me.

Note that the body is converted to CRLF earlier:

https://github.com/mikel/mail/blob/7c43c84c16f017e0ff5e5c9962f6a1d842301ee3/lib/mail/body.rb#L44

However this does not do anything unless the original mail is ascii-only.

On Tue, 12 Feb 2019 at 08:20, sebb <[email protected]> wrote:
>
> On Tue, 12 Feb 2019 at 01:34, Sam Ruby <[email protected]> wrote:
> >
> > On Mon, Feb 11, 2019 at 5:12 PM sebb <[email protected]> wrote:
> > >
> > > On Mon, 11 Feb 2019 at 21:44, Sam Ruby <[email protected]> wrote:
> > > >
> > > > That is indeed a big in the mail gem. I contributed a patch that is a
> > > > partial fix.
> > > >
> > > > What is essential is that you only convert headers not the binary
> > > > attachment as that can corrupt images.
> > >
> > > AFAICT headers are already handled correctly even if they are 
> > > LF-terminated.
> > > I assume this is because parsing looks for CR and/or LF - there's
> > > certainly some code that does this.
> > >
> > > This does not seem to be the case for body parsing.
> > > From what little I have seen, it looks as though the code tries to
> > > convert to CRLF before continuing to parse.
> > >
> > > If the body is not ascii_only then the method Mail::Utilities.to_crlf
> > > does not convert anything.
> > > I'm pretty sure that is what is happening with this email.
> > >
> > > Surely binary images have to be base64-encoded or similar before they
> > > can be sent in an email?
> > >
> > > AFAIK, CR and LF are not allowed in mail data - or have I got that wrong?
> >
> > For starters, if everybody correctly created emails according to the
> > spec, there would not be a need to provide any fixups.  This also
> > explains why Craig resending the mail (with a proper email client that
> > respects the relevant RFCs) works around the problem.
> >
> > Older versions of the mail gem used to do a global conversion of LF
> > which was not preceded by a CR to CRLF (CRLF is required by the spec,
> > but again, that part of the spec is routinely ignored)..  In response
> > to a bug report where that code was corrupting attachments, they
> > "fixed" the code to no longer do that except for the headers, but only
> > if the headers were pure ASCII.  I provided a patch which expanded it
> > to headers that are correctly encoded, and that patch was accepted.
> >
> > Clearly this is not enough.
> >
> > What probably should be done is to split the file at the first
> > \r?\n\r?\n (a blank line separates the headers from the body).  Then
> > do a global change of LFs which are not preceded by a CR to a CRLF -
> > but only in the headers.  Then reattach the header and body and parse
> > that.
>
> That is exactly what was previously done, but it did not work, because
> the problem is in parsing the body, not the headers.
>
> The headers parse fine even if LF-only terminated.
>
> Also, whilst CRLF is required by the spec, AFAICT all the mails that
> are processed by deliver.rb have LF-only.
> Just have a look at some of the members/board mail archives on Whimsy
> - these don't have any LF-processing added by tools/deliver.rb
> Ditto for mbox files on minotaur.
> Perhaps this is because CRLF is mandated for SMTP, but once the mail
> has been received, it's no longer a requirement?
>
> What I think needs to happen is for the mail gem to treat both CRLF
> and LF as EOL (possibly also CR only) when parsing the body.
> It already does this for the headers (as far as I can tell).
> There should be no need to 'fix' the input at all if it uses the
> appropriate EOL markers.
> However on output, of course it should always generate CRLF.
>
> > > S.
> >
> > - Sam Ruby
> >
> > > > On Mon, Feb 11, 2019, 4:11 PM sebb <[email protected] wrote:
> > > >
> > > > > It looks as though the mail gem does try to convert the input to CRLF
> > > > > before parsing.
> > > > > However, it only does this if the input is ASCII-only (*).
> > > > >
> > > > > Whilst this is the best approach, I think we can ignore that check and
> > > > > just convert to CRLF.
> > > > > [It may cause a very occasional glitch, but any such should be 
> > > > > obvious.]
> > > > >
> > > > > It explains why the original email failed to parse: there are several
> > > > > non-ASCII chars in it.
> > > > > These have been converted to quoted-printable by the version you
> > > > > forwarded, thus allowing the parse to complete OK.
> > > > >
> > > > > I'll commit a fix shortly.
> > > > >
> > > > > S.
> > > > > (*) This applies to binary data, which is all that we know about the
> > > > > input at this point.
> > > > >
> > > > > On Mon, 11 Feb 2019 at 18:55, sebb <[email protected]> wrote:
> > > > > >
> > > > > > Seems to be an issue with the parser not handling certain messages
> > > > > > which have LF-only line terminators.
> > > > > >
> > > > > > I tried converting the mail file to CRLF and reparsing, and the
> > > > > > attachment appeared.
> > > > > >
> > > > > > AFAICT all messages are being stored as LF-only, and I don't know 
> > > > > > why
> > > > > > this only affects some mails.
> > > > > > The one in question is quite complicated with various quoted mails
> > > > > > before the attachment; that might have affected parsing.
> > > > > >
> > > > > > The copy of the mail sent to my GMail account appears to have CRLF
> > > > > > line terminators, yet the copies on minotaur and mbox-vm are 
> > > > > > LF-only.
> > > > > > Perhaps GMail auto-converts the mail?
> > > > > > Not sure what is happening here.
> > > > > >
> > > > > > On Mon, 11 Feb 2019 at 17:20, sebb <[email protected]> wrote:
> > > > > > >
> > > > > > > The immediate reason is that the parsed message does not have any
> > > > > > > attachments listed in the summary yml file, so does not show up 
> > > > > > > in the
> > > > > > > workbench.
> > > > > > >
> > > > > > > I will look further into why the parsing did not see the 
> > > > > > > attachment.
> > > > > > >
> > > > > > > S.
> > > > > > > On Mon, 11 Feb 2019 at 17:11, sebb <[email protected]> wrote:
> > > > > > > >
> > > > > > > > I'll take a look
> > > > > > > >
> > > > > > > > On Mon, 11 Feb 2019 at 16:27, Craig Russell 
> > > > > > > > <[email protected]>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > This seems to be happening more frequently: an ICLA shows up 
> > > > > > > > > in
> > > > > secretary mail inbox but does not appear in workbench.
> > > > > > > > >
> > > > > > > > > If I forward the message to secretary, it then appears in
> > > > > workbench.
> > > > > > > > >
> > > > > > > > > Without violating PII, I cannot forward the message directly. 
> > > > > > > > > But
> > > > > this is the message that was received this AM and never made it to
> > > > > workbench:
> > > > > > > > >
> > > > > > > > > From: Olivier Coutu
> > > > > > > > > Date: 6:10 AM PST
> > > > > > > > > Subject: Fwd: Re: Patch to reduce lint time
> > > > > > > > >
> > > > > > > > > Any ideas?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Craig
> > > > > > > > >
> > > > > > > > > Craig L Russell
> > > > > > > > > Secretary, Apache Software Foundation
> > > > > > > > > [email protected] http://db.apache.org/jdo
> > > > > > > > >
> > > > >

Reply via email to