On 3 May 2018 at 14:12, Sam Ruby <[email protected]> wrote:
> On Thu, May 3, 2018 at 8:46 AM, sebb <[email protected]> wrote:
>> I've been looking at the mail to secretary@ that caused a problem recently.
>>
>> I think the issue may be that the headers are not all ASCII, there are
>> a couple of o-umlauts.
>> If these are replaced with plain 'o's then the headers are parsed OK.
>>
>> This causes message.rb to crash in the rescue block
>>
>> @335: from = mail[:from].value.sub(/\s+<.*?>$/)
>>
>> because mail[:from] is nil.
>>
>> That could be avoided by using .to_s rather than .value, but that
>> causes the headers to be saved as a single blob with the key of the
>> first header.
>>
>> A possible solution might be to trap all errors and set some dummy
>> values for the Yaml file
>> This would at least allow the user to be alerted to the issue.
>>
>> But ideally the parser should be persuaded to handle non-ASCII header values.
>> They are not allowed, but they seem to be quite common.
>
> It is slightly more complicated than that.  It actually will correctly
> parse headers with non-ascii characters if the headers are separated
> by \r\n.

The code tries to do the replace, but uses gsub rather than gsub! so
it has no effect...

I just tried fixing that and - yay! it works!

However I think it would still be useful to trap errors and store the
raw message with a suitable set of attributes in the yml file.

> A correctly formed email message uses \r\n as a separator between
> headers.  Once upon a time, the mail gem would essentially do the
> equivalent of s/\n\/\r\n/ on such emails but that would occasionally
> corrupt attachments.
>
> In essence, that code was removed with the intention of replacing it
> if the headers were otherwise correctly encoded.  I authored a patch
> (which was accepted, but I can't find it right now) which did that if
> the headers were pure ascii.  If that patch can be found, that thread
> included instructions on restoring the old behavior using a method
> call with the word "unsafe" or somesuch in it.  Of course, that could
> corrupt attachments, which for our use case can be bad.
>
> The safest way to do this is to separate the headers from the body,
> fix the headers, and then reattach the two before parsing:
>
> https://github.com/apache/whimsy/blob/6830b808866e140bd0f436c2cd02f9c66527fcc8/www/secretary/workbench/models/message.rb#L318
>
> Perhaps this code could be put in lib/whimsy/asf someplace?

Are there other places where it is needed?

> - Sam Ruby

Reply via email to