On 3 May 2018 at 14:47, sebb <[email protected]> wrote:
> On 3 May 2018 at 14:12, Sam Ruby <[email protected]> wrote:
>> On Thu, May 3, 2018 at 8:46 AM, sebb <[email protected]> wrote:
>>> I've been looking at the mail to secretary@ that caused a problem recently.
>>>
>>> I think the issue may be that the headers are not all ASCII, there are
>>> a couple of o-umlauts.
>>> If these are replaced with plain 'o's then the headers are parsed OK.
>>>
>>> This causes message.rb to crash in the rescue block
>>>
>>> @335: from = mail[:from].value.sub(/\s+<.*?>$/)
>>>
>>> because mail[:from] is nil.
>>>
>>> That could be avoided by using .to_s rather than .value, but that
>>> causes the headers to be saved as a single blob with the key of the
>>> first header.
>>>
>>> A possible solution might be to trap all errors and set some dummy
>>> values for the Yaml file
>>> This would at least allow the user to be alerted to the issue.
>>>
>>> But ideally the parser should be persuaded to handle non-ASCII header 
>>> values.
>>> They are not allowed, but they seem to be quite common.
>>
>> It is slightly more complicated than that.  It actually will correctly
>> parse headers with non-ascii characters if the headers are separated
>> by \r\n.
>
> The code tries to do the replace, but uses gsub rather than gsub! so
> it has no effect...
>
> I just tried fixing that and - yay! it works!
>
> However I think it would still be useful to trap errors and store the
> raw message with a suitable set of attributes in the yml file.

Done.

Also allowed for missing From: header.

>> A correctly formed email message uses \r\n as a separator between
>> headers.  Once upon a time, the mail gem would essentially do the
>> equivalent of s/\n\/\r\n/ on such emails but that would occasionally
>> corrupt attachments.
>>
>> In essence, that code was removed with the intention of replacing it
>> if the headers were otherwise correctly encoded.  I authored a patch
>> (which was accepted, but I can't find it right now) which did that if

Did you mean this?

https://github.com/mikel/mail/pull/1168

>> the headers were pure ascii.  If that patch can be found, that thread
>> included instructions on restoring the old behavior using a method
>> call with the word "unsafe" or somesuch in it.  Of course, that could
>> corrupt attachments, which for our use case can be bad.
>>
>> The safest way to do this is to separate the headers from the body,
>> fix the headers, and then reattach the two before parsing:
>>
>> https://github.com/apache/whimsy/blob/6830b808866e140bd0f436c2cd02f9c66527fcc8/www/secretary/workbench/models/message.rb#L318
>>
>> Perhaps this code could be put in lib/whimsy/asf someplace?
>
> Are there other places where it is needed?
>
>> - Sam Ruby

Reply via email to