On 3 May 2018 at 21:16, Sam Ruby <[email protected]> wrote:
> On Thu, May 3, 2018 at 10:37 AM, sebb <[email protected]> wrote:
>> On 3 May 2018 at 14:47, sebb <[email protected]> wrote:
>>> On 3 May 2018 at 14:12, Sam Ruby <[email protected]> wrote:
>>>> On Thu, May 3, 2018 at 8:46 AM, sebb <[email protected]> wrote:
>>>>> I've been looking at the mail to secretary@ that caused a problem 
>>>>> recently.
>>>>>
>>>>> I think the issue may be that the headers are not all ASCII, there are
>>>>> a couple of o-umlauts.
>>>>> If these are replaced with plain 'o's then the headers are parsed OK.
>>>>>
>>>>> This causes message.rb to crash in the rescue block
>>>>>
>>>>> @335: from = mail[:from].value.sub(/\s+<.*?>$/)
>>>>>
>>>>> because mail[:from] is nil.
>>>>>
>>>>> That could be avoided by using .to_s rather than .value, but that
>>>>> causes the headers to be saved as a single blob with the key of the
>>>>> first header.
>>>>>
>>>>> A possible solution might be to trap all errors and set some dummy
>>>>> values for the Yaml file
>>>>> This would at least allow the user to be alerted to the issue.
>>>>>
>>>>> But ideally the parser should be persuaded to handle non-ASCII header 
>>>>> values.
>>>>> They are not allowed, but they seem to be quite common.
>>>>
>>>> It is slightly more complicated than that.  It actually will correctly
>>>> parse headers with non-ascii characters if the headers are separated
>>>> by \r\n.
>>>
>>> The code tries to do the replace, but uses gsub rather than gsub! so
>>> it has no effect...
>>>
>>> I just tried fixing that and - yay! it works!
>
> DOH!  Thanks!
>
>>> However I think it would still be useful to trap errors and store the
>>> raw message with a suitable set of attributes in the yml file.
>
> Error reporting is certainly something that can be improved on.  It
> isn't clear to me what role the yml file would play in this.  In
> particular, the secretary workbench application tends to ignore emails
> without attachments...

In which case the dummy entry could include dummy attachments as well
so it gets noticed?

>> Done.
>>
>> Also allowed for missing From: header.
>
> Thanks!
>
>>>> A correctly formed email message uses \r\n as a separator between
>>>> headers.  Once upon a time, the mail gem would essentially do the
>>>> equivalent of s/\n\/\r\n/ on such emails but that would occasionally
>>>> corrupt attachments.
>>>>
>>>> In essence, that code was removed with the intention of replacing it
>>>> if the headers were otherwise correctly encoded.  I authored a patch
>>>> (which was accepted, but I can't find it right now) which did that if
>>
>> Did you mean this?
>>
>> https://github.com/mikel/mail/pull/1168
>
> Yes! The code that I now have in whimsy really belongs in the mail
> gem, complete with a test case.  I once started down that path, and
> there is a place where the mail gem splits headers from the body that
> could be fixed, and once the headers are separated, adding \r
> characters if there are absolutely none is unlikely to break anything.
> But I never completed that work.  Mostly, if I recall correctly,
> because the code defined named constants for the separators, and those
> names matched the relevant specifications.  Changing those constants
> felt wrong, hardcoding strings felt inconsistent with the coding
> standards in use in that code, and where I got stuck was in deciding
> new names for constants.
>
>>>> the headers were pure ascii.  If that patch can be found, that thread
>>>> included instructions on restoring the old behavior using a method
>>>> call with the word "unsafe" or somesuch in it.  Of course, that could
>>>> corrupt attachments, which for our use case can be bad.
>>>>
>>>> The safest way to do this is to separate the headers from the body,
>>>> fix the headers, and then reattach the two before parsing:
>>>>
>>>> https://github.com/apache/whimsy/blob/6830b808866e140bd0f436c2cd02f9c66527fcc8/www/secretary/workbench/models/message.rb#L318
>>>>
>>>> Perhaps this code could be put in lib/whimsy/asf someplace?
>>>
>>> Are there other places where it is needed?
>
> Not sure.  Try git grep 'Mail.new(' and git grep 'Mail.read' to see
> other places where it may help.
>
>>>> - Sam Ruby
>
> - Sam Ruby

Reply via email to