On Mon 21/Apr/2025 21:05:03 +0200 Murray S. Kucherawy wrote:
On Mon, Apr 21, 2025 at 2:14 PM Alessandro Vesely <[email protected]> wrote:
While it is relatively easy to detect mime-wrap, footer or similar
transformation, changes in encodings, quotes and comments are difficult or
impossible to guess. Quoted printable can encode each and every character
except alphanumeric with a fixed 76 characters per line. Or it can encode
only non-ASCII characters and insert soft-breaks at the 76th character. Or
something in between. It might make sense to recognize some QP encoding
styles, but then it would be difficult for signers to determine which style
of encoding they are signing. It is much simpler to decode QP and put
base64.
You could canonicalize and then verify that, so:
Content-Type: text/plain; charset=us-ascii
...is hashed as:
Content-Type: text/plain; charset="us-ascii"
...whether the quotes are there or not.
Or you can use a message parser that finds all the peculiarities in the
original message and adds a header field with a blob summarizing them. For
example, it can set a bit of a 64-bit word to be:
0: the value of charset in Content-Type is a token;
1: the value of charset in Content-Type is a quoted string;
A similar parser can be run on the transformed message and then XOR its result
to describe the differences. If the field has a comment or if the difference
from the possible MLM outputs is non-standard, the original field should be
saved in its entirety.
QP strings can be converted to base64 strings, or simply the encodings can
be removed, and then the result hashed.
The latter looks fine, but it is a new canonicalization method.
And then "relaxed" can take care of space additions and wrapping.
But you can only go so far with such heuristics. At some point I think
you'd be going way too far to guess at upstream changes that may or may not
have happened.
The ML signer can still control the transformation itself, so it is faced with
a limited set of possible differences. When these fall within a standardized
set of transformations, they can be expressed very concisely. To describe the
difference due to a MIME wrap that preserves preamble and epilogue, for
example, it is not necessary to repeat the content of the added part. Saying
mime-wrap is sufficient to recover the original.
I don't know what a QP "style" is; there's only one encoding I know of.
RFC 2045 offers several options, for example you can encode spaces or not, or
only in some cases. You can insert line breaks in the middle of a word or try
not to do it. I would say that each encoder has its own style, but perhaps
there are a few libraries that are the most popular and it might be worth
standardizing the corresponding styles.
Best
Ale
--
_______________________________________________
Ietf-dkim mailing list -- [email protected]
To unsubscribe send an email to [email protected]