On 3 May 2018 at 21:16, Sam Ruby <[email protected]> wrote: > On Thu, May 3, 2018 at 10:37 AM, sebb <[email protected]> wrote: >> On 3 May 2018 at 14:47, sebb <[email protected]> wrote: >>> On 3 May 2018 at 14:12, Sam Ruby <[email protected]> wrote: >>>> On Thu, May 3, 2018 at 8:46 AM, sebb <[email protected]> wrote: >>>>> I've been looking at the mail to secretary@ that caused a problem >>>>> recently. >>>>> >>>>> I think the issue may be that the headers are not all ASCII, there are >>>>> a couple of o-umlauts. >>>>> If these are replaced with plain 'o's then the headers are parsed OK. >>>>> >>>>> This causes message.rb to crash in the rescue block >>>>> >>>>> @335: from = mail[:from].value.sub(/\s+<.*?>$/) >>>>> >>>>> because mail[:from] is nil. >>>>> >>>>> That could be avoided by using .to_s rather than .value, but that >>>>> causes the headers to be saved as a single blob with the key of the >>>>> first header. >>>>> >>>>> A possible solution might be to trap all errors and set some dummy >>>>> values for the Yaml file >>>>> This would at least allow the user to be alerted to the issue. >>>>> >>>>> But ideally the parser should be persuaded to handle non-ASCII header >>>>> values. >>>>> They are not allowed, but they seem to be quite common. >>>> >>>> It is slightly more complicated than that. It actually will correctly >>>> parse headers with non-ascii characters if the headers are separated >>>> by \r\n. >>> >>> The code tries to do the replace, but uses gsub rather than gsub! so >>> it has no effect... >>> >>> I just tried fixing that and - yay! it works! > > DOH! Thanks! > >>> However I think it would still be useful to trap errors and store the >>> raw message with a suitable set of attributes in the yml file. > > Error reporting is certainly something that can be improved on. It > isn't clear to me what role the yml file would play in this. In > particular, the secretary workbench application tends to ignore emails > without attachments...
In which case the dummy entry could include dummy attachments as well so it gets noticed? >> Done. >> >> Also allowed for missing From: header. > > Thanks! > >>>> A correctly formed email message uses \r\n as a separator between >>>> headers. Once upon a time, the mail gem would essentially do the >>>> equivalent of s/\n\/\r\n/ on such emails but that would occasionally >>>> corrupt attachments. >>>> >>>> In essence, that code was removed with the intention of replacing it >>>> if the headers were otherwise correctly encoded. I authored a patch >>>> (which was accepted, but I can't find it right now) which did that if >> >> Did you mean this? >> >> https://github.com/mikel/mail/pull/1168 > > Yes! The code that I now have in whimsy really belongs in the mail > gem, complete with a test case. I once started down that path, and > there is a place where the mail gem splits headers from the body that > could be fixed, and once the headers are separated, adding \r > characters if there are absolutely none is unlikely to break anything. > But I never completed that work. Mostly, if I recall correctly, > because the code defined named constants for the separators, and those > names matched the relevant specifications. Changing those constants > felt wrong, hardcoding strings felt inconsistent with the coding > standards in use in that code, and where I got stuck was in deciding > new names for constants. > >>>> the headers were pure ascii. If that patch can be found, that thread >>>> included instructions on restoring the old behavior using a method >>>> call with the word "unsafe" or somesuch in it. Of course, that could >>>> corrupt attachments, which for our use case can be bad. >>>> >>>> The safest way to do this is to separate the headers from the body, >>>> fix the headers, and then reattach the two before parsing: >>>> >>>> https://github.com/apache/whimsy/blob/6830b808866e140bd0f436c2cd02f9c66527fcc8/www/secretary/workbench/models/message.rb#L318 >>>> >>>> Perhaps this code could be put in lib/whimsy/asf someplace? >>> >>> Are there other places where it is needed? > > Not sure. Try git grep 'Mail.new(' and git grep 'Mail.read' to see > other places where it may help. > >>>> - Sam Ruby > > - Sam Ruby
