> It appears that Wei Chuang  <wei...@google.com> said:
> > If the RFC2045 canonical representation at the final destination can be the
> > same as the canonical representation at the original sender, ...

> When we were working on DKIM canonicalization we had lengthy discussions about
> what to do about MIME and we decided not to even try.

A mistake IMO.

> There is no canonical
> representation of a MIME message and nobody to my knowledge has ever tried to
> describe what it would mean for two MIME messages to be equivalent, since they
> could vary in a fantastic number of ways.

First, a caonnical form doesn't have to produce a 100% reliable equivalency
test in order to be useful.

Second, there can be more to a hash computation than a canonical form. This
is especially true given that a MIME message is a tree.

> Part separators can change, the
> pieces of multipart/whatever might change, line breaks in quoted-printable
> and base64 can change, spacing and capitalization of headers can change, and
> that's just what I can think of in two minutes.

If you treat the message as a Merkle tree with:

o Separate header and body hashes
o Decoding message bodies prior to hashing
o Applying the already-defined unfolding/capitalization stuff from DKIM
  to part headers.
o Removing the CTE field and boundary value from CT fields in the header

You end up with a value that's:

o Invariant in regards to part separator changes
o Invariant in regards to CTE changes
o Invariant in regards to many/most common header changes 
o Allows for rapid computation of hashes for large numbers of large messages
  that share common content.

Which I note takes care of your list.

But the question is, as always, whether or not defining such a thing is worth
the trouble. At this point I think the answer is "no".

                                Ned

_______________________________________________
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc

Reply via email to