Hi Albrecht,

Hmmm, yea, I think events on the GMimeParser probably make the most sense and 
perhaps as you suggested a structure that holds an enum scope/classification 
and gint64 stream offset would be the way to do it. I would think, though, that 
we might also want to reference the GMimeObject as well?

The next problem is that base64 isn’t decoded by the parser, it’s handled later 
by GMimeDataWrapper::write_to_stream() and only if/when the user calls that.


That said, the duplicate Content-Transfer-Encoding header error reporting could 
easily happen in gmime-parser.c without too much trouble.

If you grep for _g_mime_object_append_header, you’ll discover a few cases that 
end up adding the headers to a GMimeObject. For example:

                for (i = 0; i < priv->headers->len; i++) {
                header = priv->headers->pdata[i];
                
                if (g_ascii_strncasecmp (header->name, "Content-", 8) != 0) {
                        _g_mime_object_append_header ((GMimeObject *) message, 
header->name, header->raw_name,
                                                      header->raw_value, 
header->offset);
                }
        }


It would be fairly trivial to track whether or not you’ve already seen a 
Content-Transfer-Encoding header and if you find a second (or third, etc) 
version, emit an event signifying that. Each header item should also have a 
stream offset for where that header was found, so getting that info is trivial.


That’s the simplest case I think.

On 10/16/17, 3:50 PM, "Albrecht Dreß" <[email protected]> wrote:

    Hi Jeff:
    
    Am 16.10.17 16:01 schrieb(en) Jeffrey Stedfast via balsa-list:
    > Ah, yes, okay – now I understand better. Different software interprets 
things differently so when writing software that attempts to filter out viruses 
and other types of attacks, it might not interpret things exactly the same as 
the user’s email client.
    
    Exactly.  As I mentioned earlier, it's not limited to security appliances, 
though.  Some of my users usually read messages on the iPhone first.  If a 
message looked sane there, [s]he will probably open it carelessly later with 
Outlook, but the message might be interpreted differently there.
    
    >> The NUL byte inserted by GMime when decoding base64 content with “=” 
inside the block (slide 32) looks like a bug to me, though.
    > 
    > It might be, I’ll have to look into it a bit more.
    
    See the attached little test code and feed the TEST.txt file into it…
    
    >> This leads to the question if it would be possible to extract this 
information from the GMime parser.  Although Balsa is probably not a target 
(due to the regrettably small number of installations and the robustness of 
Linux), it should be able to display a warning as mentioned above, just in case 
the user reads the messages using different MUA's (think of IMAP, plus 
Outlook/iOS/Android/…).  But it would be an *extremely* helpful feature for 
writing some kind of “security scanner”.
    > 
    > I think it would be possible, I think the question is mostly how to 
surface this information in a usable way.
    
    The information about one issue detected by the parser could be a simple 
set of numerical values, e.g. something like
    - offset: position within the input stream
    - scope: e.g. as enum: message header; part header; part body; embedded 
message header; …
    - classification: e.g. as enum: rfc violation (like unencoded 8-bit chars 
in a header, or duplicated boundary parameter, …plus all the stuff 
g_mime_parser_options_set_*_compliance_mode() controls); strange content (like 
“=” inside base64 block, unnecessary line folding in headers, …); …
    - issue code: number
    
    Maybe the latter two could be glued together, as to make the approach less 
complex.  Sometimes it is not clear anyway how to classify an issue:  Two 
contradictory Content-Transfer-Encoding headers for one part are against the 
intentions of RFC 2045, but not explicitly forbidden iirc.
    
    For passing the issues to the caller:  As GMimeParser is derived from 
GObject, the most simple way might be a signal the caller can connect to.  The 
callback would receive the three or four values above.  This leaves the 
existing API unchanged and does not introduce any extra memory requirements 
(like an error stack).  The performance penalty (which only applies to messages 
containing any issues anyway) should be small.  And the approach is easily 
extensible – as to notify about an other issue, just define the enum code, and 
emit the signal.
    
    If you could give me a hint where I should start within your sources, I 
could try to implement a small (I hope…) example patch for one of the issues 
addressed by the g_mime_parser_options_set_*_compliance_mode() options as to 
illustrate the approach.
    
    Cheers,
    Albrecht.

_______________________________________________
balsa-list mailing list
[email protected]
https://mail.gnome.org/mailman/listinfo/balsa-list

Reply via email to