Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-12-06 Thread Alessandro Vesely

On Sun 05/Dec/2021 20:55:30 +0100 Wei Chuang wrote:

On Wed, Dec 1, 2021 at 3:45 AM Alessandro Vesely  wrote:

On Tue 30/Nov/2021 18:30:39 +0100 John R Levine wrote:

On Tue, 30 Nov 2021, Wei Chuang wrote:

What about adding a footer to some html mime part is poorly handled when
using "l="?  Multipart bodies could be handled by other techniques.


See section 8.2 in the DKIM spec which says if you use l= you need to
be careful with your MIME boundaries so naughty people can't add another
part that overlays the real message. >

Agreed there's risk in HTML hiding content and showing malicious things but
that risk has existed before.  An updated DKIM authenticator could help us
understand who did those malicious updates along some forwarding path.



ARC can do better for such kind of forensic analysis.



[...]  Hence I retract what I said in my previous message[*], that l=
works well with a wide range of mailing lists. >

Could a way of dealing with "l=" is extending the list-canon
 ideas
of identifying signatures by mime-parts into identifying length as well?
In other words, with list-canon each part generates a hash, and similarly
each part can have a length of the content in that part that is claimed.
It also records the content-type for each part.  I'm going to guess that
this is to help identify changes like what I believe you are concerned
with.



Those ideas have been ruminated for a while.  See also
https://datatracker.ietf.org/doc/html/draft-crocker-dkim-doseta-00
https://datatracker.ietf.org/doc/html/draft-vesely-smooth-canon-00

In fact, at this point DKIM is what it is, for the good and the bad of it.


Anyway, I wouldn't want to authenticate a message that underwent an HTML 
footer addition, because it can completely replace the original content in

the end recipient's eyes.  My draft requires footers to be plain text.


I was looking at the footers that Googlegroups puts in, and it seems to add
them to both the text/plain and text/html parts.  At least one IETF mailing
list adds a new mime-part with text/plain.  BTW has someone cataloged all
these possible mailing list changes?



Wikipedia has a list of ten software packages, dunno if it's complete:
https://en.wikipedia.org/wiki/List_of_mailing_list_software
It doesn't dig into their peculiarities.

Mailing lists existed before computers, and are not regulated by strict rules, 
so trying to harness their behavior holds little water.  For example, it is 
customary these days to have discussion fora backed by mailing lists, where 
messages arrive with a From: display name referring to a user while the address 
part points to the forum server.  In that case, the user most likely typed the 
text in a web page, so the From: is not /re/-written, it's written that way 
from the start.  Although those messages have parts that were not written by 
the user, it is not possible to catalogue the changes.  And not even ARC can 
handle such messages in a way that would results in a From: address pointing to 
the real author.



Best
Ale
--





___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-12-06 Thread Murray S. Kucherawy
On Sun, Dec 5, 2021 at 12:24 PM John R Levine  wrote:

> I'm pretty sure that changing DKIM is very out of scope for this working
> group.
>

As I read the charter, I don't agree.  It says in at least two places that
this could be in scope.

Whether the chairs want the WG to engage in such work right now is another
matter.

-MSK
___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-12-05 Thread Benny Pedersen

On 2021-12-05 21:24, John R Levine wrote:
Agreed there's risk in HTML hiding content and showing malicious 
things but
that risk has existed before.  An updated DKIM authenticator could 
help us

understand who did those malicious updates along some forwarding path.


I'm pretty sure that changing DKIM is very out of scope for this 
working group.


+1


We have a decade of experience with DKIM.  If l= were useful, someone
would have figured it out by now.


is there any talks about dkim l= tag anywhere ?, can dkim verify l= 
number of lines is not changed ?, will it gives special results if its 
changed or not changed, does dmarc understand this tests in dkim ?


if dkim cant do this its not usefull dkim specs says it exists imho

___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-12-05 Thread John R Levine

Agreed there's risk in HTML hiding content and showing malicious things but
that risk has existed before.  An updated DKIM authenticator could help us
understand who did those malicious updates along some forwarding path.


I'm pretty sure that changing DKIM is very out of scope for this working 
group.


We have a decade of experience with DKIM.  If l= were useful, someone 
would have figured it out by now.


R's,
John

___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-12-02 Thread Alessandro Vesely

On Wed 01/Dec/2021 20:10:30 +0100 John Levine wrote:

It appears that Alessandro Vesely   said:

I'm not clear about the last but one paragraph of that section:

   An example of such an attack includes altering the MIME structure,
   exploiting lax HTML parsing in the MUA, and defeating duplicate
   message detection algorithms.

I'm going to file an errata about it.  Altering the MIME structure is only 
possible if the value of l= is less than the original message length. 


I wish you hadn't.  I think the original concern was for sloppy MIME that
forgot the -- after the last part.



I hope such errors are not so common as to deserve some kind of standardization.


Anyway, I wouldn't want to authenticate a message that underwent an HTML footer 
addition, because it can completely replace the original content in the end 
recipient's eyes.  My draft requires footers to be plain text.


Yet that's exactly what one of the largest discussion group services in the 
world did.
As I keep pointing out, this is like an UNCOL, it does not generalize enough to 
be useful.

On the other hand, ARC handles this just fine.



I, for one, am unable to use ARC as a receiver and authenticate messages that 
may well be spear phishing.  So even though ARC can handle everything, it is 
not usable by everyone.


In order to trust the authorship of a message from Yahoo groups you have to 
trust Yahoo, either expressing your trust in an ARC filter configuration file 
or directly whitelisting Yahoo groups in a DMARC filter.  However, not all 
mailing lists need such special settings to authenticate their posters.  There 
are mailing lists which make no changes, and ones which make revertible changes.


Your objection sounds like you find that a lisp compiler is useless because it 
doesn't compile fortran, which is one of the most ubiquitous languages in the 
world.


Two methods is better than one.


Best
Ale
--








___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-12-01 Thread John Levine
It appears that Alessandro Vesely   said:
>I'm not clear about the last but one paragraph of that section:
>
>An example of such an attack includes altering the MIME structure,
>exploiting lax HTML parsing in the MUA, and defeating duplicate
>message detection algorithms.
>
>I'm going to file an errata about it.  Altering the MIME structure is only 
>possible if the value of l= is less than the original message length. 

I wish you hadn't.  I think the original concern was for sloppy MIME that
forgot the -- after the last part.

>Anyway, I wouldn't want to authenticate a message that underwent an HTML 
>footer 
>addition, because it can completely replace the original content in the end 
>recipient's eyes.  My draft requires footers to be plain text.

Yet that's exactly what one of the largest discussion group services in the 
world did.
As I keep pointing out, this is like an UNCOL, it does not generalize enough to 
be useful.

On the other hand, ARC handles this just fine.

R's,
John

___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-12-01 Thread Alessandro Vesely

On Tue 30/Nov/2021 18:30:39 +0100 John R Levine wrote:

On Tue, 30 Nov 2021, Wei Chuang wrote:

What about adding a footer to some html mime part is poorly handled when
using "l="?  Multipart bodies could be handled by other techniques.


See section 8.2 in the DKIM spec which says if you use l= you need to be 
careful with your MIME boundaries so naughty people can't add another part that 
overlays the real message.



I'm not clear about the last but one paragraph of that section:

   An example of such an attack includes altering the MIME structure,
   exploiting lax HTML parsing in the MUA, and defeating duplicate
   message detection algorithms.

I'm going to file an errata about it.  Altering the MIME structure is only 
possible if the value of l= is less than the original message length.  If the 
whole MIME structure forms the body hash, including the epilogue, it is not 
feasible to alter it.  If Content-Type is not signed, the attacker can change 
the top boundary and append whatever content she likes, thereby relegating the 
original, unaltered MIME structure to the role of preamble.


Setting l= with MIME structures doesn't help a non-breaking footer addition. 
For plain text messages, it would help.  However, Section 5.4.1. suggests to 
sign Content-Type when setting l=, to avoid the above attack, and signing 
Content-Type makes for breakable DKIM signatures since MLMs overwrite that 
field.  (Yes, they set text/plain again if the type was such, but 
capitalization and comments may differ.)  Hence I retract what I said in my 
previous message[*], that l= works well with a wide range of mailing lists.


Finally, I thought duplicate message detection was rather related to Message-ID 
than to l=.



I have seen lists that edit the footer into the HTML of the body, I think at 
Yahoo groups.  Don't see how you're going to describe that in a reversible way.



Hm... when HTML parts are paired by alternative plain text parts, it is hard, 
due to HTML complexity, to make the added footers look alike.


Anyway, I wouldn't want to authenticate a message that underwent an HTML footer 
addition, because it can completely replace the original content in the end 
recipient's eyes.  My draft requires footers to be plain text.



Best
Ale
--

[*] https://mailarchive.ietf.org/arch/msg/dmarc/MKgdtkeKzNNguNWHwtrop0gvb_g






___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-30 Thread Scott Kitterman



On November 30, 2021 5:30:39 PM UTC, John R Levine  wrote:
>On Tue, 30 Nov 2021, Wei Chuang wrote:
>> What about adding a footer to some html mime part is poorly handled when
>> using "l="?  Multipart bodies could be handled by other techniques.
>
>See section 8.2 in the DKIM spec which says if you use l= you need to be 
>careful with your MIME boundaries so naughty people can't add another part 
>that overlays the real message.
>
>I have seen lists that edit the footer into the HTML of the body, I think 
>at Yahoo groups.  Don't see how you're going to describe that in a 
>reversible way.

Or, we could stop trying to design a DKIM replacement and work on DMARC.

Scott K

___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-30 Thread Alessandro Vesely

On Mon 29/Nov/2021 15:17:45 +0100 John R Levine wrote:

On Mon 29/Nov/2021 04:03:57 +0100 John Levine wrote:

This was part of the discussion about what sort of body modifications to
allow. We ended up with optionally ignoring white space changes, and l= to
ignore added text. My impression is that neither is useful. Very few
messages pass with relaxed canonicalization that don't also pass strict.


Using relaxed rather than strict is quite different between header and body. 
It is fairly frequent to find reflowed headers, especially with MLM handling, 
while bodies remain mostly untouched, except for CR additions and removals.


Of course, X-MIME-Autoconverted rewrite bodies beyond strict/ relaxed range. 
(That's the original mistake.)


Well yeah, welcome to mail UNCOL land.



Those conversions used to afflict direct mail flows as well.


It'd be enough to add the subject tag on new messages to address the other 
changes.  Using l= works well with a wide range of mailing lists.  However, 
it only works with plain text.


I suppose if by wide range you mean lists that do not add subject tags and do 
not handle html or multipart bodies.  That may be common among nerd lists but 
take a look beyond mailman and I don't think it is.



OTOH, it'd feel cringing to discuss the standardization of solutions to deal 
with indirect mail flows using a mailing list where neither of those solutions 
apply.



Best
Ale
--






















___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-29 Thread Murray S. Kucherawy
On Sun, Nov 28, 2021 at 3:31 PM Wei Chuang  wrote:

>
> This approach and benefit was what I was thinking could be feasible as
> well.  The cited draft-kucherawy-dkim-list-canon
>  draft 
> notes
> your contribution to the concept described there i.e. to perform hashing as
> a mime-tree (though that draft doesn't do content transport decoding).
>

It does, in Section 3.

-MSK
___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-29 Thread Murray S. Kucherawy
On Thu, Nov 25, 2021 at 12:07 AM Wei Chuang  wrote:

> Sorry I wasn't too clear here.  It's largely the same idea as the DKIM
> body length "l=" field above except for reformulated for the Subject header
> and its mailing list mutations.  The original sender would encode a length
> of the original subject say "s.l=".  A receiver would only hash the
> right most "s.l=" length string when validating a Subject hash from
> the original sender.  This assumes that mailing lists may prepend a string
> typically for identification.


Seems to me that means I could insert anything I want before the last N
octets of Subject -- say, a URI pointing you to an ad or other unsavory
content -- and the original signature will verify.

-MSK

>
___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-29 Thread John R Levine

On Mon 29/Nov/2021 04:03:57 +0100 John Levine wrote:

This was part of the discussion about what sort of body modifications to
allow. We ended up with optionally ignoring white space changes, and l= to
ignore added text. My impression is that neither is useful. Very few
messages pass with relaxed canonicalization that don't also pass strict.


Using relaxed rather than strict is quite different between header and body. 
It is fairly frequent to find reflowed headers, especially with MLM handling, 
while bodies remain mostly untouched, except for CR additions and removals.


Of course, X-MIME-Autoconverted rewrite bodies beyond strict/ relaxed range. 
(That's the original mistake.)


Well yeah, welcome to mail UNCOL land.

It'd be enough to add the subject tag on new messages to address the other 
changes.  Using l= works well with a wide range of mailing lists.  However, 
it only works with plain text.


I suppose if by wide range you mean lists that do not add subject tags and 
do not handle html or multipart bodies.  That may be common among nerd 
lists but take a look beyond mailman and I don't think it is.


Regards,
John Levine, jo...@taugh.com, Taughannock Networks, Trumansburg NY
Please consider the environment before reading this e-mail. https://jl.ly

___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-29 Thread Alessandro Vesely

On Mon 29/Nov/2021 04:03:57 +0100 John Levine wrote:

It appears that   said:

It appears that Wei Chuang   said:

If the RFC2045 canonical representation at the final destination can be the
same as the canonical representation at the original sender, ...



When we were working on DKIM canonicalization we had lengthy discussions about
what to do about MIME and we decided not to even try.


A mistake IMO.


This was part of the discussion about what sort of body modifications to
allow. We ended up with optionally ignoring white space changes, and l= to
ignore added text. My impression is that neither is useful. Very few
messages pass with relaxed canonicalization that don't also pass strict.


Using relaxed rather than strict is quite different between header and body. 
It is fairly frequent to find reflowed headers, especially with MLM handling, 
while bodies remain mostly untouched, except for CR additions and removals.


Of course, X-MIME-Autoconverted rewrite bodies beyond strict/ relaxed range. 
(That's the original mistake.)




The goal of l= was to allow mailing lists to add footers, but as we've seen
in this discussion, if a list adds a footer it's likely to make other
changes too.


It'd be enough to add the subject tag on new messages to address the other 
changes.  Using l= works well with a wide range of mailing lists.  However, it 
only works with plain text.



Best
Ale
--















___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-28 Thread John Levine
It appears that   said:
>> It appears that Wei Chuang   said:
>> > If the RFC2045 canonical representation at the final destination can be the
>> > same as the canonical representation at the original sender, ...
>
>> When we were working on DKIM canonicalization we had lengthy discussions 
>> about
>> what to do about MIME and we decided not to even try.
>
>A mistake IMO.

This was part of the discussion about what sort of body modifications
to allow. We ended up with optionally ignoring white space changes,
and l= to ignore added text. My impression is that neither is useful.
Very few messages pass with relaxed canonicalization that don't also
pass strict. The goal of l= was to allow mailing lists to add footers,
but as we've seen in this discussion, if a list adds a footer it's
likely to make other changes too.  I think the main use case for
relaxed mode was an old bug in sendmail that added an extra \r\n
on the way through, but it's long gone.

For MIME, the question wasn't just whether two versions of messages
were equivalent, but the impossible question of what other changes
keep the message "the same" and which are too different. As you note
there are lots of ways that a message could be recoded into equivalent
MIME parts, but again it is my impression that those sorts of changes
are rare without also adding or removing body parts which gets us into
the swamp of how different is too different. So we didn't try.

R's,
John

___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-28 Thread Dave Crocker

On 11/28/2021 3:31 PM, Wei Chuang wrote:
What type of concern do you have?  Is it algorithmic complexity?  Or 
runtime or header size overhead?


Biggest concern, for changing anything that has a significant installed 
base of use and users, is their willingness to make the changes.


They have to see a compelling case for the time, effort and expense.

d/

--
Dave Crocker
dcroc...@gmail.com
408.329.0791

Volunteer, Silicon Valley Chapter
Information & Planning Coordinator
American Red Cross
dave.crock...@redcross.org

___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-28 Thread Wei Chuang
On Sat, Nov 27, 2021 at 7:34 PM Ned Freed  wrote:

> > It appears that Wei Chuang   said:
> > > If the RFC2045 canonical representation at the final destination can
> be the
> > > same as the canonical representation at the original sender, ...
>
> > When we were working on DKIM canonicalization we had lengthy discussions
> about
> > what to do about MIME and we decided not to even try.
>
> A mistake IMO.
>
> > There is no canonical
> > representation of a MIME message and nobody to my knowledge has ever
> tried to
> > describe what it would mean for two MIME messages to be equivalent,
> since they
> > could vary in a fantastic number of ways.
>
> First, a caonnical form doesn't have to produce a 100% reliable equivalency
> test in order to be useful.
>
> Second, there can be more to a hash computation than a canonical form. This
> is especially true given that a MIME message is a tree.
>
> > Part separators can change, the
> > pieces of multipart/whatever might change, line breaks in
> quoted-printable
> > and base64 can change, spacing and capitalization of headers can change,
> and
> > that's just what I can think of in two minutes.
>
> If you treat the message as a Merkle tree with:
>
> o Separate header and body hashes
> o Decoding message bodies prior to hashing
> o Applying the already-defined unfolding/capitalization stuff from DKIM
>   to part headers.
> o Removing the CTE field and boundary value from CT fields in the header
>
> You end up with a value that's:
>
> o Invariant in regards to part separator changes
> o Invariant in regards to CTE changes
> o Invariant in regards to many/most common header changes
> o Allows for rapid computation of hashes for large numbers of large
> messages
>   that share common content.
>
> Which I note takes care of your list.
>

This approach and benefit was what I was thinking could be feasible as
well.  The cited draft-kucherawy-dkim-list-canon

draft notes
your contribution to the concept described there i.e. to perform hashing as
a mime-tree (though that draft doesn't do content transport decoding).


> But the question is, as always, whether or not defining such a thing is
> worth
> the trouble. At this point I think the answer is "no".
>

What type of concern do you have?  Is it algorithmic complexity?  Or
runtime or header size overhead?

-Wei
___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-27 Thread ned+dmarc
> It appears that Wei Chuang   said:
> > If the RFC2045 canonical representation at the final destination can be the
> > same as the canonical representation at the original sender, ...

> When we were working on DKIM canonicalization we had lengthy discussions about
> what to do about MIME and we decided not to even try.

A mistake IMO.

> There is no canonical
> representation of a MIME message and nobody to my knowledge has ever tried to
> describe what it would mean for two MIME messages to be equivalent, since they
> could vary in a fantastic number of ways.

First, a caonnical form doesn't have to produce a 100% reliable equivalency
test in order to be useful.

Second, there can be more to a hash computation than a canonical form. This
is especially true given that a MIME message is a tree.

> Part separators can change, the
> pieces of multipart/whatever might change, line breaks in quoted-printable
> and base64 can change, spacing and capitalization of headers can change, and
> that's just what I can think of in two minutes.

If you treat the message as a Merkle tree with:

o Separate header and body hashes
o Decoding message bodies prior to hashing
o Applying the already-defined unfolding/capitalization stuff from DKIM
  to part headers.
o Removing the CTE field and boundary value from CT fields in the header

You end up with a value that's:

o Invariant in regards to part separator changes
o Invariant in regards to CTE changes
o Invariant in regards to many/most common header changes 
o Allows for rapid computation of hashes for large numbers of large messages
  that share common content.

Which I note takes care of your list.

But the question is, as always, whether or not defining such a thing is worth
the trouble. At this point I think the answer is "no".

Ned

___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-26 Thread Alessandro Vesely

On Thu 25/Nov/2021 09:07:36 +0100 Wei Chuang wrote:

Thanks for the feedback and answers.



You're welcome



On Wed, Nov 24, 2021 at 3:01 AM Alessandro Vesely  wrote:

On Tue 23/Nov/2021 00:28:01 +0100 Wei Chuang wrote:

[...]

6. Subject
* Agreed that some simple heuristic as proposed in the draft is a good 
approach.  Perhaps the original subject suffix length also might work 
here too. >>

I don't get this, I'm afraid.  What is the subject suffix length?


Sorry I wasn't too clear here.  It's largely the same idea as the DKIM body
length "l=" field above except for reformulated for the Subject header and
its mailing list mutations.  The original sender would encode a length of
the original subject say "s.l=".  A receiver would only hash the
right most "s.l=" length string when validating a Subject hash from
the original sender.  This assumes that mailing lists may prepend a string
typically for identification.



Oh, yeah.  However, unless we store sl= as an additional DKIM tag, we'd need an 
extra header field such as Subject-Length.  In that case, Original-Subject is 
much more straightforward.


Original-Subject also covers possible AW: to Re: translations.

Finally, the MLM prefix must be short.  It is not acceptable to have an entire 
phrase followed by hundreds of white space before the original subject.



Best
Ale
--
















___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-25 Thread John Levine
It appears that Wei Chuang   said:
>If the RFC2045 canonical representation at the final destination can be the
>same as the canonical representation at the original sender, ...

When we were working on DKIM canonicalization we had lengthy discussions about
what to do about MIME and we decided not to even try.  There is no canonical
representation of a MIME message and nobody to my knowledge has ever tried to
describe what it would mean for two MIME messages to be equivalent, since they
could vary in a fantastic number of ways.  Part separators can change, the
pieces of multipart/whatever might change, line breaks in quoted-printable
and base64 can change, spacing and capitalization of headers can change, and
that's just what I can think of in two minutes.

R's,
John

___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-25 Thread Wei Chuang
Thanks for the feedback and answers.

On Wed, Nov 24, 2021 at 3:01 AM Alessandro Vesely  wrote:

> Hi,
>
> On Tue 23/Nov/2021 00:28:01 +0100 Wei Chuang wrote:
> >
> > 1. Ale's draft suggests not reversing all possible transforms, and
> rather
> > focuses on a subset caused by mailing lists that are reversible
> >* Could ARC be suitable for those other scenarios? Could we expect
> that
> > forwarders that do more substantial irreversible rewriting such as
> modifying
> > URLs in a spam/phishing filter MTA, already have a strong relationship
> with the
> > receiver?  Presumably, might they be trusted by the receiver and their
> ARC
> > result could be used?
>
>
> Sure.  Note that if the receiver trusts the MLM, simply recognizing it
> would be
> enough to pass DMARC per the "mailing_list" policy override.  ARC
> additionally
> provides the ability to learn the authentication status of the message
> when it
> was received by the MLM.  That way, reputation can be reckoned with great
> precision.
>
>
> > 2. Footers must only be added with as a) append on single text/plain
> part b)
> > mime part appended to multipart/mixed c) mime wrap where a footer is
> added in a
> > new multipart/mixed.
> >* It's not very clear to me how Ale's draft handles the b) and c)
> scenario.
> >   (There is mention of "reason="transformed"", but this still seems
> incomplete)
> >   I saw that Murray has a draft draft-kucherawy-dkim-list-canon
> >  that
>
> > identifies addition of new mime parts that could be helpful there.
>
>
> I tried and implemented Murray's draft, but it requires that MLMs declare
> which
> transformation they do.  Since they don't, you need a pre-parser that
> guesses
> the transformation type.  That's the difference between the two drafts.
>
> If there are two top-level MIME parts, the transformation must be (c),
> because
> no one writes a MIME structure with just one part.  Otherwise it's (b).
>
>
> > 3.  Footers added to text/plain must be identified with at least four
> "_" as a
> > separator.
> >* Would the DKIM length "l=" field be helpful?  Understood there are
> abuse
> > risks.
>
>
> Yes, l= could be a useful hint.
>
> The risk of l= is that an attacker could exploit a poor HTML interpreter
> to add
> a part that completely hides the original content when rendered.
> Requiring
> attachments to be plain text avoids that risk.
>
>
> > 4. "quoted-printable encoding must not be used for... single-part
> text/plain
> > messages, as it is impossible to guess original soft line breaks after
> re-encoding"
> > * Are you suggesting quoted printable encoding aren't fully
> reversible?
> > Actually, could the RFC2045 canonical encoding of the message be used as
> the
> > source for doing the DKIM content hashing?  This would bypass having to
> worry
> > about additional transfer re-encodings by forwarders.
>
>
> Mailman can copy MIME structures without changes, but simple text is often
> re-encoded.  Many messages on this list are converted to base64.  If the
> original text was quoted printable, its form depends on the agent.  An
> agent
> can choose where to break lines, whether to encode some characters or
> represent
> them as ASCII, whether to break lines at column 76 or, to increase
> readability,
> at white spaces.  That can vary too widely.
>

If the RFC2045 canonical representation at the final destination can be the
same as the canonical representation at the original sender, (and assuming
there isn't some content modification like adding inline footers etc). then
that might be a way of side stepping some of the issues with quoted
printable encoding.  Understood that would be a departure from DKIM/ARC
hashing and verification.


>
> > 5. Finding the original FROM by looking at From, Author, Original-From,
> > X-Original-From, Reply-To, and Cc.
> >* Can this be standardized to a fixed location such as Author?
> (Sorry I'm
> > unfamiliar with the discussion on Author)
>
>
> That's exactly the purpose of Author.  However, no one is using it yet.
>
>
> > 6. Subject
> >* Agreed that some simple heuristic as proposed in the draft is a
> good
> > approach.  Perhaps the original subject suffix length also might work
> here too.
>
>
> I don't get this, I'm afraid.  What is the subject suffix length?
>

Sorry I wasn't too clear here.  It's largely the same idea as the DKIM body
length "l=" field above except for reformulated for the Subject header and
its mailing list mutations.  The original sender would encode a length of
the original subject say "s.l=".  A receiver would only hash the
right most "s.l=" length string when validating a Subject hash from
the original sender.  This assumes that mailing lists may prepend a string
typically for identification.

thanks again,
-Wei


>
> Best
> Ale
> --
>
>
>
>
>
>
>
___
dmarc mailing list
dmarc@ietf.org

Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-24 Thread Alessandro Vesely

On Wed 24/Nov/2021 17:30:08 +0100 John Levine wrote:

It appears that Alessandro Vesely   said:
Sure.  Note that if the receiver trusts the MLM, simply recognizing it would be 
enough to pass DMARC per the "mailing_list" policy override.  ARC additionally 
provides the ability to learn the authentication status of the message when it 
was received by the MLM.  That way, reputation can be reckoned with great 
precision.


If you trust the mailing list, you can just have a whitelist and
completely ignore DMARC. If only.



Including the accepted message in aggregate reports with proper indications is 
not ignoring DMARC.




Someone else from Google told me that they know perfectly well where
all the mailing lists are but they cannot do that because many lists
leak spam when spammers steal address books and send spam with
a fake From: of a subscriber. ARC specifically addresses this
situation by letting the recipient do the filtering that the list
didn't, e.g., reject unaligned input messages.



I don't understand that.  If the message was rejectable of quarantinable why 
did the MLM pass it?  It looks as if the MLM implements ARC but not DMARC.



Best
Ale
--




___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-24 Thread John Levine
It appears that Alessandro Vesely   said:
>Sure.  Note that if the receiver trusts the MLM, simply recognizing it would 
>be 
>enough to pass DMARC per the "mailing_list" policy override.  ARC additionally 
>provides the ability to learn the authentication status of the message when it 
>was received by the MLM.  That way, reputation can be reckoned with great 
>precision.

If you trust the mailing list, you can just have a whitelist and
completely ignore DMARC. If only.

Someone else from Google told me that they know perfectly well where
all the mailing lists are but they cannot do that because many lists
leak spam when spammers steal address books and send spam with
a fake From: of a subscriber. ARC specifically addresses this
situation by letting the recipient do the filtering that the list
didn't, e.g., reject unaligned input messages.

R's,
John

___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-24 Thread Douglas Foster
If you are evaluating messages, you are free to use any strategy that seems
worth your effort.

However, if you are the sender of messages that cannot produce DMARC PASS,
you have a bigger problem.  You need an algorithm which can be trusted to
usefully distinguish between malicious and non-malicious senders, and has
an expectation of widespread implementation.   I don't think such an
algorithm exists.

For the evaluator, we need to consider diminishing returns on effort.
 First, I discard any messages from senders with bad reputations, then I
discard messages with bad content, then I allow messages that pass sender
authentication.   What is left is a set of messages which have no visible
problems but which might (or might not) represent malicious impersonations
of a legitimate sender.   How much additional processing power is required
to achieve how much improved decision-making?

It might be useful to parse the RECEIVED chain to look for known-bad IP
addresses or DNS names, but it is not an easy algorithm.   It might be
possible to develop a reliable NP rule that blocks messages with domain
names that are unregistered or not used for email, but we are not there yet
and those messages are likely to be a small subset of the problem.

Attempts at additional granularity will encounter problems like "Can I
reliably identify messages that were forwarded?" and "Can I reliably
identify messages that come from a mailing list?"   Not easily.

Of course, one can choose "Always Block" as AOL has done, or choose "Always
Allow", as I suspect many others have done.   Neither approach is optimal.

At some point, the best answer comes from actually looking at the message
to answer the question, "Is this a message that I want?"   Once an answer
is obtained, I need to create a local policy rule which implements my
answer for all future messages that have the same fingerprint.   We have
never described the exception process, and as a result many products have
exception mechanisms that are inadequate and inappropriate.  I believe that
a document which describes the exception process will be the best that we
can do and will be a big step forward.   I have begun experimenting with
drafts, but am not ready to submit one yet.

Doug Foster



On Mon, Nov 22, 2021 at 6:28 PM Wei Chuang  wrote:

> Hi all,
>
> [Standard disclaimer, that the comments below are my own and don't
> represent my employer at all]
>
> I saw Ale's draft draft-vesely-dmarc-mlm-transform
> 
> in the ARC list, and wanted to discuss some of the ideas.  Just to put in
> context my points- My understanding is we have to trust the forwarders to
> use the ARC authentication results, which we might not have because the
> forwarder is new, or has low volume.  Moreover mailing lists do much of the
> modifications that break DMARC.  This is a common enough scenario that I
> think it's useful to consider alternatives such as the ideas found in Ale's
> draft.  The numbered bullets are ideas summarized from Ale's draft, and
> asterix * bullets are my comments about that.
>
> 1. Ale's draft suggests not reversing all possible transforms, and rather
> focuses on a subset caused by mailing lists that are reversible
>   * Could ARC be suitable for those other scenarios? Could we expect that
> forwarders that do more substantial irreversible rewriting such as
> modifying URLs in a spam/phishing filter MTA, already have a strong
> relationship with the receiver?  Presumably, might they be trusted by the
> receiver and their ARC result could be used?
> 2. Footers must only be added with as a) append on single text/plain part
> b) mime part appended to multipart/mixed c) mime wrap where a footer is
> added in a new multipart/mixed.
>   * It's not very clear to me how Ale's draft handles the b) and c)
> scenario.   (There is mention of "reason="transformed"", but this still
> seems incomplete)  I saw that Murray has a draft
> draft-kucherawy-dkim-list-canon
>  that
> identifies addition of new mime parts that could be helpful there.
> 3.  Footers added to text/plain must be identified with at least four "_"
> as a separator.
>   * Would the DKIM length "l=" field be helpful?  Understood there are
> abuse risks.
> 4. "quoted-printable encoding must not be used for... single-part
> text/plain messages, as it is impossible to guess original soft line breaks
> after re-encoding"
>* Are you suggesting quoted printable encoding aren't fully
> reversible?  Actually, could the RFC2045 canonical encoding of the message
> be used as the source for doing the DKIM content hashing?  This would
> bypass having to worry about additional transfer re-encodings by
> forwarders.
> 5. Finding the original FROM by looking at From, Author, Original-From,
> X-Original-From, Reply-To, and Cc.
>   * Can this be standardized to a fixed location such as Author?  

Re: [dmarc-ietf] Reversing modifications from mailing lists

2021-11-24 Thread Alessandro Vesely

Hi,

On Tue 23/Nov/2021 00:28:01 +0100 Wei Chuang wrote:


1. Ale's draft suggests not reversing all possible transforms, and rather 
focuses on a subset caused by mailing lists that are reversible
   * Could ARC be suitable for those other scenarios? Could we expect that 
forwarders that do more substantial irreversible rewriting such as modifying 
URLs in a spam/phishing filter MTA, already have a strong relationship with the 
receiver?  Presumably, might they be trusted by the receiver and their ARC 
result could be used?



Sure.  Note that if the receiver trusts the MLM, simply recognizing it would be 
enough to pass DMARC per the "mailing_list" policy override.  ARC additionally 
provides the ability to learn the authentication status of the message when it 
was received by the MLM.  That way, reputation can be reckoned with great 
precision.



2. Footers must only be added with as a) append on single text/plain part b) 
mime part appended to multipart/mixed c) mime wrap where a footer is added in a 
new multipart/mixed.
   * It's not very clear to me how Ale's draft handles the b) and c) scenario.  
  (There is mention of "reason="transformed"", but this still seems incomplete) 
  I saw that Murray has a draft draft-kucherawy-dkim-list-canon 
 that 
identifies addition of new mime parts that could be helpful there.



I tried and implemented Murray's draft, but it requires that MLMs declare which 
transformation they do.  Since they don't, you need a pre-parser that guesses 
the transformation type.  That's the difference between the two drafts.


If there are two top-level MIME parts, the transformation must be (c), because 
no one writes a MIME structure with just one part.  Otherwise it's (b).



3.  Footers added to text/plain must be identified with at least four "_" as a 
separator.
   * Would the DKIM length "l=" field be helpful?  Understood there are abuse 
risks.



Yes, l= could be a useful hint.

The risk of l= is that an attacker could exploit a poor HTML interpreter to add 
a part that completely hides the original content when rendered.  Requiring 
attachments to be plain text avoids that risk.



4. "quoted-printable encoding must not be used for... single-part text/plain 
messages, as it is impossible to guess original soft line breaks after re-encoding"
    * Are you suggesting quoted printable encoding aren't fully reversible?  
Actually, could the RFC2045 canonical encoding of the message be used as the 
source for doing the DKIM content hashing?  This would bypass having to worry 
about additional transfer re-encodings by forwarders.



Mailman can copy MIME structures without changes, but simple text is often 
re-encoded.  Many messages on this list are converted to base64.  If the 
original text was quoted printable, its form depends on the agent.  An agent 
can choose where to break lines, whether to encode some characters or represent 
them as ASCII, whether to break lines at column 76 or, to increase readability, 
at white spaces.  That can vary too widely.



5. Finding the original FROM by looking at From, Author, Original-From, 
X-Original-From, Reply-To, and Cc.
   * Can this be standardized to a fixed location such as Author?  (Sorry I'm 
unfamiliar with the discussion on Author)



That's exactly the purpose of Author.  However, no one is using it yet.



6. Subject
   * Agreed that some simple heuristic as proposed in the draft is a good 
approach.  Perhaps the original subject suffix length also might work here too.



I don't get this, I'm afraid.  What is the subject suffix length?


Best
Ale
--






___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc