On Mon 21/Apr/2025 21:05:03 +0200 Murray S. Kucherawy wrote:
On Mon, Apr 21, 2025 at 2:14 PM Alessandro Vesely <[email protected]> wrote:

While it is relatively easy to detect mime-wrap, footer or similar transformation, changes in encodings, quotes and comments are difficult or impossible to guess. Quoted printable can encode each and every character except alphanumeric with a fixed 76 characters per line. Or it can encode only non-ASCII characters and insert soft-breaks at the 76th character. Or something in between. It might make sense to recognize some QP encoding styles, but then it would be difficult for signers to determine which style of encoding they are signing. It is much simpler to decode QP and put base64.

You could canonicalize and then verify that, so:

Content-Type: text/plain; charset=us-ascii

...is hashed as:

Content-Type: text/plain; charset="us-ascii"

...whether the quotes are there or not.


Or you can use a message parser that finds all the peculiarities in the original message and adds a header field with a blob summarizing them. For example, it can set a bit of a 64-bit word to be:

0: the value of charset in Content-Type is a token;
1: the value of charset in Content-Type is a quoted string;

A similar parser can be run on the transformed message and then XOR its result to describe the differences. If the field has a comment or if the difference from the possible MLM outputs is non-standard, the original field should be saved in its entirety.


QP strings can be converted to base64 strings, or simply the encodings can
be removed, and then the result hashed.

The latter looks fine, but it is a new canonicalization method.


And then "relaxed" can take care of space additions and wrapping.

But you can only go so far with such heuristics. At some point I think you'd be going way too far to guess at upstream changes that may or may not have happened.


The ML signer can still control the transformation itself, so it is faced with a limited set of possible differences. When these fall within a standardized set of transformations, they can be expressed very concisely. To describe the difference due to a MIME wrap that preserves preamble and epilogue, for example, it is not necessary to repeat the content of the added part. Saying mime-wrap is sufficient to recover the original.


I don't know what a QP "style" is; there's only one encoding I know of.


RFC 2045 offers several options, for example you can encode spaces or not, or only in some cases. You can insert line breaks in the middle of a word or try not to do it. I would say that each encoder has its own style, but perhaps there are a few libraries that are the most popular and it might be worth standardizing the corresponding styles.


Best
Ale
--




_______________________________________________
Ietf-dkim mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to