On Thu, May 3, 2018 at 10:37 AM, sebb <[email protected]> wrote: > On 3 May 2018 at 14:47, sebb <[email protected]> wrote: >> On 3 May 2018 at 14:12, Sam Ruby <[email protected]> wrote: >>> On Thu, May 3, 2018 at 8:46 AM, sebb <[email protected]> wrote: >>>> I've been looking at the mail to secretary@ that caused a problem recently. >>>> >>>> I think the issue may be that the headers are not all ASCII, there are >>>> a couple of o-umlauts. >>>> If these are replaced with plain 'o's then the headers are parsed OK. >>>> >>>> This causes message.rb to crash in the rescue block >>>> >>>> @335: from = mail[:from].value.sub(/\s+<.*?>$/) >>>> >>>> because mail[:from] is nil. >>>> >>>> That could be avoided by using .to_s rather than .value, but that >>>> causes the headers to be saved as a single blob with the key of the >>>> first header. >>>> >>>> A possible solution might be to trap all errors and set some dummy >>>> values for the Yaml file >>>> This would at least allow the user to be alerted to the issue. >>>> >>>> But ideally the parser should be persuaded to handle non-ASCII header >>>> values. >>>> They are not allowed, but they seem to be quite common. >>> >>> It is slightly more complicated than that. It actually will correctly >>> parse headers with non-ascii characters if the headers are separated >>> by \r\n. >> >> The code tries to do the replace, but uses gsub rather than gsub! so >> it has no effect... >> >> I just tried fixing that and - yay! it works!
DOH! Thanks! >> However I think it would still be useful to trap errors and store the >> raw message with a suitable set of attributes in the yml file. Error reporting is certainly something that can be improved on. It isn't clear to me what role the yml file would play in this. In particular, the secretary workbench application tends to ignore emails without attachments... > Done. > > Also allowed for missing From: header. Thanks! >>> A correctly formed email message uses \r\n as a separator between >>> headers. Once upon a time, the mail gem would essentially do the >>> equivalent of s/\n\/\r\n/ on such emails but that would occasionally >>> corrupt attachments. >>> >>> In essence, that code was removed with the intention of replacing it >>> if the headers were otherwise correctly encoded. I authored a patch >>> (which was accepted, but I can't find it right now) which did that if > > Did you mean this? > > https://github.com/mikel/mail/pull/1168 Yes! The code that I now have in whimsy really belongs in the mail gem, complete with a test case. I once started down that path, and there is a place where the mail gem splits headers from the body that could be fixed, and once the headers are separated, adding \r characters if there are absolutely none is unlikely to break anything. But I never completed that work. Mostly, if I recall correctly, because the code defined named constants for the separators, and those names matched the relevant specifications. Changing those constants felt wrong, hardcoding strings felt inconsistent with the coding standards in use in that code, and where I got stuck was in deciding new names for constants. >>> the headers were pure ascii. If that patch can be found, that thread >>> included instructions on restoring the old behavior using a method >>> call with the word "unsafe" or somesuch in it. Of course, that could >>> corrupt attachments, which for our use case can be bad. >>> >>> The safest way to do this is to separate the headers from the body, >>> fix the headers, and then reattach the two before parsing: >>> >>> https://github.com/apache/whimsy/blob/6830b808866e140bd0f436c2cd02f9c66527fcc8/www/secretary/workbench/models/message.rb#L318 >>> >>> Perhaps this code could be put in lib/whimsy/asf someplace? >> >> Are there other places where it is needed? Not sure. Try git grep 'Mail.new(' and git grep 'Mail.read' to see other places where it may help. >>> - Sam Ruby - Sam Ruby
