> It appears that Wei Chuang <wei...@google.com> said: > > If the RFC2045 canonical representation at the final destination can be the > > same as the canonical representation at the original sender, ...
> When we were working on DKIM canonicalization we had lengthy discussions about > what to do about MIME and we decided not to even try. A mistake IMO. > There is no canonical > representation of a MIME message and nobody to my knowledge has ever tried to > describe what it would mean for two MIME messages to be equivalent, since they > could vary in a fantastic number of ways. First, a caonnical form doesn't have to produce a 100% reliable equivalency test in order to be useful. Second, there can be more to a hash computation than a canonical form. This is especially true given that a MIME message is a tree. > Part separators can change, the > pieces of multipart/whatever might change, line breaks in quoted-printable > and base64 can change, spacing and capitalization of headers can change, and > that's just what I can think of in two minutes. If you treat the message as a Merkle tree with: o Separate header and body hashes o Decoding message bodies prior to hashing o Applying the already-defined unfolding/capitalization stuff from DKIM to part headers. o Removing the CTE field and boundary value from CT fields in the header You end up with a value that's: o Invariant in regards to part separator changes o Invariant in regards to CTE changes o Invariant in regards to many/most common header changes o Allows for rapid computation of hashes for large numbers of large messages that share common content. Which I note takes care of your list. But the question is, as always, whether or not defining such a thing is worth the trouble. At this point I think the answer is "no". Ned _______________________________________________ dmarc mailing list dmarc@ietf.org https://www.ietf.org/mailman/listinfo/dmarc