I've created a PR

https://github.com/mikel/mail/pull/1319

On Tue, 12 Feb 2019 at 11:30, sebb <[email protected]> wrote:
>
> This is where the body parsing fails:
>
> https://github.com/mikel/mail/blob/7c43c84c16f017e0ff5e5c9962f6a1d842301ee3/lib/mail/body.rb#L298
>
> If the \r\n is changed to \r?\n then it works for me.
>
> Note that the body is converted to CRLF earlier:
>
> https://github.com/mikel/mail/blob/7c43c84c16f017e0ff5e5c9962f6a1d842301ee3/lib/mail/body.rb#L44
>
> However this does not do anything unless the original mail is ascii-only.
>
> On Tue, 12 Feb 2019 at 08:20, sebb <[email protected]> wrote:
> >
> > On Tue, 12 Feb 2019 at 01:34, Sam Ruby <[email protected]> wrote:
> > >
> > > On Mon, Feb 11, 2019 at 5:12 PM sebb <[email protected]> wrote:
> > > >
> > > > On Mon, 11 Feb 2019 at 21:44, Sam Ruby <[email protected]> wrote:
> > > > >
> > > > > That is indeed a big in the mail gem. I contributed a patch that is a
> > > > > partial fix.
> > > > >
> > > > > What is essential is that you only convert headers not the binary
> > > > > attachment as that can corrupt images.
> > > >
> > > > AFAICT headers are already handled correctly even if they are 
> > > > LF-terminated.
> > > > I assume this is because parsing looks for CR and/or LF - there's
> > > > certainly some code that does this.
> > > >
> > > > This does not seem to be the case for body parsing.
> > > > From what little I have seen, it looks as though the code tries to
> > > > convert to CRLF before continuing to parse.
> > > >
> > > > If the body is not ascii_only then the method Mail::Utilities.to_crlf
> > > > does not convert anything.
> > > > I'm pretty sure that is what is happening with this email.
> > > >
> > > > Surely binary images have to be base64-encoded or similar before they
> > > > can be sent in an email?
> > > >
> > > > AFAIK, CR and LF are not allowed in mail data - or have I got that 
> > > > wrong?
> > >
> > > For starters, if everybody correctly created emails according to the
> > > spec, there would not be a need to provide any fixups.  This also
> > > explains why Craig resending the mail (with a proper email client that
> > > respects the relevant RFCs) works around the problem.
> > >
> > > Older versions of the mail gem used to do a global conversion of LF
> > > which was not preceded by a CR to CRLF (CRLF is required by the spec,
> > > but again, that part of the spec is routinely ignored)..  In response
> > > to a bug report where that code was corrupting attachments, they
> > > "fixed" the code to no longer do that except for the headers, but only
> > > if the headers were pure ASCII.  I provided a patch which expanded it
> > > to headers that are correctly encoded, and that patch was accepted.
> > >
> > > Clearly this is not enough.
> > >
> > > What probably should be done is to split the file at the first
> > > \r?\n\r?\n (a blank line separates the headers from the body).  Then
> > > do a global change of LFs which are not preceded by a CR to a CRLF -
> > > but only in the headers.  Then reattach the header and body and parse
> > > that.
> >
> > That is exactly what was previously done, but it did not work, because
> > the problem is in parsing the body, not the headers.
> >
> > The headers parse fine even if LF-only terminated.
> >
> > Also, whilst CRLF is required by the spec, AFAICT all the mails that
> > are processed by deliver.rb have LF-only.
> > Just have a look at some of the members/board mail archives on Whimsy
> > - these don't have any LF-processing added by tools/deliver.rb
> > Ditto for mbox files on minotaur.
> > Perhaps this is because CRLF is mandated for SMTP, but once the mail
> > has been received, it's no longer a requirement?
> >
> > What I think needs to happen is for the mail gem to treat both CRLF
> > and LF as EOL (possibly also CR only) when parsing the body.
> > It already does this for the headers (as far as I can tell).
> > There should be no need to 'fix' the input at all if it uses the
> > appropriate EOL markers.
> > However on output, of course it should always generate CRLF.
> >
> > > > S.
> > >
> > > - Sam Ruby
> > >
> > > > > On Mon, Feb 11, 2019, 4:11 PM sebb <[email protected] wrote:
> > > > >
> > > > > > It looks as though the mail gem does try to convert the input to 
> > > > > > CRLF
> > > > > > before parsing.
> > > > > > However, it only does this if the input is ASCII-only (*).
> > > > > >
> > > > > > Whilst this is the best approach, I think we can ignore that check 
> > > > > > and
> > > > > > just convert to CRLF.
> > > > > > [It may cause a very occasional glitch, but any such should be 
> > > > > > obvious.]
> > > > > >
> > > > > > It explains why the original email failed to parse: there are 
> > > > > > several
> > > > > > non-ASCII chars in it.
> > > > > > These have been converted to quoted-printable by the version you
> > > > > > forwarded, thus allowing the parse to complete OK.
> > > > > >
> > > > > > I'll commit a fix shortly.
> > > > > >
> > > > > > S.
> > > > > > (*) This applies to binary data, which is all that we know about the
> > > > > > input at this point.
> > > > > >
> > > > > > On Mon, 11 Feb 2019 at 18:55, sebb <[email protected]> wrote:
> > > > > > >
> > > > > > > Seems to be an issue with the parser not handling certain messages
> > > > > > > which have LF-only line terminators.
> > > > > > >
> > > > > > > I tried converting the mail file to CRLF and reparsing, and the
> > > > > > > attachment appeared.
> > > > > > >
> > > > > > > AFAICT all messages are being stored as LF-only, and I don't know 
> > > > > > > why
> > > > > > > this only affects some mails.
> > > > > > > The one in question is quite complicated with various quoted mails
> > > > > > > before the attachment; that might have affected parsing.
> > > > > > >
> > > > > > > The copy of the mail sent to my GMail account appears to have CRLF
> > > > > > > line terminators, yet the copies on minotaur and mbox-vm are 
> > > > > > > LF-only.
> > > > > > > Perhaps GMail auto-converts the mail?
> > > > > > > Not sure what is happening here.
> > > > > > >
> > > > > > > On Mon, 11 Feb 2019 at 17:20, sebb <[email protected]> wrote:
> > > > > > > >
> > > > > > > > The immediate reason is that the parsed message does not have 
> > > > > > > > any
> > > > > > > > attachments listed in the summary yml file, so does not show up 
> > > > > > > > in the
> > > > > > > > workbench.
> > > > > > > >
> > > > > > > > I will look further into why the parsing did not see the 
> > > > > > > > attachment.
> > > > > > > >
> > > > > > > > S.
> > > > > > > > On Mon, 11 Feb 2019 at 17:11, sebb <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > I'll take a look
> > > > > > > > >
> > > > > > > > > On Mon, 11 Feb 2019 at 16:27, Craig Russell 
> > > > > > > > > <[email protected]>
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > This seems to be happening more frequently: an ICLA shows 
> > > > > > > > > > up in
> > > > > > secretary mail inbox but does not appear in workbench.
> > > > > > > > > >
> > > > > > > > > > If I forward the message to secretary, it then appears in
> > > > > > workbench.
> > > > > > > > > >
> > > > > > > > > > Without violating PII, I cannot forward the message 
> > > > > > > > > > directly. But
> > > > > > this is the message that was received this AM and never made it to
> > > > > > workbench:
> > > > > > > > > >
> > > > > > > > > > From: Olivier Coutu
> > > > > > > > > > Date: 6:10 AM PST
> > > > > > > > > > Subject: Fwd: Re: Patch to reduce lint time
> > > > > > > > > >
> > > > > > > > > > Any ideas?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Craig
> > > > > > > > > >
> > > > > > > > > > Craig L Russell
> > > > > > > > > > Secretary, Apache Software Foundation
> > > > > > > > > > [email protected] http://db.apache.org/jdo
> > > > > > > > > >
> > > > > >

Reply via email to