On 3 May 2018 at 14:47, sebb <[email protected]> wrote: > On 3 May 2018 at 14:12, Sam Ruby <[email protected]> wrote: >> On Thu, May 3, 2018 at 8:46 AM, sebb <[email protected]> wrote: >>> I've been looking at the mail to secretary@ that caused a problem recently. >>> >>> I think the issue may be that the headers are not all ASCII, there are >>> a couple of o-umlauts. >>> If these are replaced with plain 'o's then the headers are parsed OK. >>> >>> This causes message.rb to crash in the rescue block >>> >>> @335: from = mail[:from].value.sub(/\s+<.*?>$/) >>> >>> because mail[:from] is nil. >>> >>> That could be avoided by using .to_s rather than .value, but that >>> causes the headers to be saved as a single blob with the key of the >>> first header. >>> >>> A possible solution might be to trap all errors and set some dummy >>> values for the Yaml file >>> This would at least allow the user to be alerted to the issue. >>> >>> But ideally the parser should be persuaded to handle non-ASCII header >>> values. >>> They are not allowed, but they seem to be quite common. >> >> It is slightly more complicated than that. It actually will correctly >> parse headers with non-ascii characters if the headers are separated >> by \r\n. > > The code tries to do the replace, but uses gsub rather than gsub! so > it has no effect... > > I just tried fixing that and - yay! it works! > > However I think it would still be useful to trap errors and store the > raw message with a suitable set of attributes in the yml file.
Done. Also allowed for missing From: header. >> A correctly formed email message uses \r\n as a separator between >> headers. Once upon a time, the mail gem would essentially do the >> equivalent of s/\n\/\r\n/ on such emails but that would occasionally >> corrupt attachments. >> >> In essence, that code was removed with the intention of replacing it >> if the headers were otherwise correctly encoded. I authored a patch >> (which was accepted, but I can't find it right now) which did that if Did you mean this? https://github.com/mikel/mail/pull/1168 >> the headers were pure ascii. If that patch can be found, that thread >> included instructions on restoring the old behavior using a method >> call with the word "unsafe" or somesuch in it. Of course, that could >> corrupt attachments, which for our use case can be bad. >> >> The safest way to do this is to separate the headers from the body, >> fix the headers, and then reattach the two before parsing: >> >> https://github.com/apache/whimsy/blob/6830b808866e140bd0f436c2cd02f9c66527fcc8/www/secretary/workbench/models/message.rb#L318 >> >> Perhaps this code could be put in lib/whimsy/asf someplace? > > Are there other places where it is needed? > >> - Sam Ruby
