On 3 May 2018 at 14:12, Sam Ruby <[email protected]> wrote: > On Thu, May 3, 2018 at 8:46 AM, sebb <[email protected]> wrote: >> I've been looking at the mail to secretary@ that caused a problem recently. >> >> I think the issue may be that the headers are not all ASCII, there are >> a couple of o-umlauts. >> If these are replaced with plain 'o's then the headers are parsed OK. >> >> This causes message.rb to crash in the rescue block >> >> @335: from = mail[:from].value.sub(/\s+<.*?>$/) >> >> because mail[:from] is nil. >> >> That could be avoided by using .to_s rather than .value, but that >> causes the headers to be saved as a single blob with the key of the >> first header. >> >> A possible solution might be to trap all errors and set some dummy >> values for the Yaml file >> This would at least allow the user to be alerted to the issue. >> >> But ideally the parser should be persuaded to handle non-ASCII header values. >> They are not allowed, but they seem to be quite common. > > It is slightly more complicated than that. It actually will correctly > parse headers with non-ascii characters if the headers are separated > by \r\n.
The code tries to do the replace, but uses gsub rather than gsub! so it has no effect... I just tried fixing that and - yay! it works! However I think it would still be useful to trap errors and store the raw message with a suitable set of attributes in the yml file. > A correctly formed email message uses \r\n as a separator between > headers. Once upon a time, the mail gem would essentially do the > equivalent of s/\n\/\r\n/ on such emails but that would occasionally > corrupt attachments. > > In essence, that code was removed with the intention of replacing it > if the headers were otherwise correctly encoded. I authored a patch > (which was accepted, but I can't find it right now) which did that if > the headers were pure ascii. If that patch can be found, that thread > included instructions on restoring the old behavior using a method > call with the word "unsafe" or somesuch in it. Of course, that could > corrupt attachments, which for our use case can be bad. > > The safest way to do this is to separate the headers from the body, > fix the headers, and then reattach the two before parsing: > > https://github.com/apache/whimsy/blob/6830b808866e140bd0f436c2cd02f9c66527fcc8/www/secretary/workbench/models/message.rb#L318 > > Perhaps this code could be put in lib/whimsy/asf someplace? Are there other places where it is needed? > - Sam Ruby
