Re: [dmarc-ietf] Yes, Aggregate Reporting meets "Internet Scale" test?

2022-12-08 Thread John Levine
It appears that Mark Alley   said:
>-=-=-=-=-=-
>
>This may have been thought of before, so forgive the potentially 
>duplicate idea, I was musing earlier about feedback reporting based on a 
>percent of the overall mail per-source. I'm thinking of something 
>similar in concept to the pct= tag for published policy.

I don't understand what problem that would solve. If you're going to
go through the effort of evaluating DMARC alignment for incoming mail,
the incremental effort to save the result in a database is small, and
saving only some of the results wouldn't make it any easier. I can say
this from experience having written the code.

Once you have the info in the database, generating aggregate reports
is a straightforward data dump and format. I suppose that if you ran a
very large mail system there might be some issues if the reports got
too big to mail, but I get regular reports from giant mail systems
including Google, Yahoo, and Comcast so we know it's not a problem in
practice.

There's a separate issue with failure reports.  Hypothetically, if
someone did a giant spam run using your address, you might get
indirectly mailbombed with failure reports.  But I can say from
experience, having been collecting failure reports for a decade,
some of my addresses are heavily forged, hardly anyone sends the
reports, and it's not a problem.

R's,
John

___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Does Aggregate Reporting meet "Internet Scale" test?

2022-12-08 Thread Barry Leiba
Mike,

> You clearly don't know what you are talking about.

That's not an appropriate thing to say, and the rest of your message
stands fine without it.  Please avoid these kinds of statements.

Barry, as chair.

___
dmarc mailing list
dmarc@ietf.org
https://www.ietf.org/mailman/listinfo/dmarc


Re: [dmarc-ietf] Does Aggregate Reporting meet "Internet Scale" test?

2022-12-08 Thread Dotzero
On Thu, Dec 8, 2022 at 1:59 AM Douglas Foster <
dougfoster.emailstanda...@gmail.com> wrote:

>
> 1) DMARC was a successful 2-company experiment, which was turned into a
> widely implemented informational RFP.   We are now writing the
> standards-track version of that concept.  We hope that Standards Track will
> provide the basis for significantly increased adoption.  This seems the
> appropriate time to ask whether the design can be optimized for
> efficiency.  If you were designing from scratch, would this reporting
> design be the result?   What alternatives have we considered and ruled out?
>

You clearly don't know what you are talking about. There were a number of
organizations involved in the original DMARC effort. I represented one of
the participating organizations. Participating organizations included
Senders (Financials, Social Media companies, other Brands and Publishers),
Mailbox Providers and Intermediaries. There was a predecessor effort, which
included a number of organizations, and was organized by J.D. Falk
(Yahoo!)called MOOCOW. Another predecessor effort was organized by Ironport
(Pat Peterson) and also included a number of organizations. I am arguably
the first person to publicly ask receivers to reject mail that did not pass
either DKIM or aligned SPF. That was in 2007 (more than 5 years before
DMARC was published and before the DMARC.ORG team was organized. My public
posts (using my then corporate email account) can be found on the DKIM-OPS
list and SPF related lists. At the time there were only a couple of large
mailbox providers which could do anything with those assertions and they
could only provide chunks of mail logs for us (under contractual
relationship) for us to mutually discuss and evaluate what was going on.
I'm not writing this to brag on what I was doing but to make it clear that
your assertion that DMARC " was a successful 2-company experiment" is
absolutely incorrect and inaccurate.

Please provide the source of your incorrect assertion.

>
> 2) The burden of reporting is not experienced equally by all report
> senders.   If I send a batch of messages from 1 source domain to:
> - 10 target domains at Google, I will get 1 report, because Google
> consolidates across target domains.
> - 10 target domains at Yahoo, I will get 10 reports, because Yahoo chooses
> to disaggregate by target domain.
> - 10 target domains to Ironport clients, I will get 20 or 30 reports.
> These are client-specific appliances, many clients have multiple appliances
> configured in parallel for load balancing, and each appliance produces its
> own report.
>
> Google presumably can dedicate servers to the reporting function, while
> the Ironport servers seem to generate reports in parallel with message
> processing.   Altogether, I conclude that Google can absorb an increase in
> workload much more easily than an appliance
>
> 3) The burden of reporting is not shared equally at present.
>  Substantially all of my reporting comes from the three sources just
> stated:  Google, Yahoo, and Ironport appliances.   Since these
> organizations have not been actively participating, perhaps you are right
> and they are happy with the present design.   On the other hand, perhaps
> someone with connections should ask them whether they want to see
> optimizations.
>
> I'll fix this for you. Ironport appliances are sold by Cisco. I was one of
the first customers of Ironport (before they were even called Ironport they
used the codename "Godspeed") and helped them by giving feedback on
development of their "A" series (optimized for outbound email) appliances.
Cisco bought them (to fill a gap in their product offerings) and
subsequently focused on development of their "S" (security) line of
devices. Cisco also reduced Ironport support for standards development in
this space. After that a number of key Ironport employees went on to found
companies which have been very supportive of efforts in this space. Several
of note are Pat Peterson (Founded Agari) and Tim Draegen (Founded DMARCIAN).

>
> 4) As DMARC participation grows, the growth curve is not really linear.
> Currently, 40% of my mailstream is covered by DMARC reporting because more
> than 30% of my outbound mail goes to Google servers.   Altogether, the
> number of reporting domains, from all sources, is somewhere around 40.   To
> move reporting from 40% of messages to 40% of domains, the volume of
> reports will grow by orders of magnitude.
>
> 5) Which then raises the question of, "Who do we expect to do reporting?"
>   Several participants in this group have expressed the conviction that
> everyone who benefits from DMARC should also contribute to DMARC by doing
> reporting.This seems fair, but it is probably not necessary.
>  Reporting from Google alone is probably sufficient for domain owners to
> know whether or not their servers are properly configured.But as long
> as we want everyone to participate, we cannot assume that everyone will
> 

Re: [dmarc-ietf] Does Aggregate Reporting meet "Internet Scale" test?

2022-12-08 Thread Mark Alley
Adding clarification since I forgot to specify - this would be 
per-sender per-source. Not a set percentage of all mail received from a 
source, that obviously would not work as intended.


On 12/8/2022 6:52 AM, Mark Alley wrote:


This may have been thought of before, so forgive the potentially 
duplicate idea, I was musing earlier about feedback reporting based on 
a percent of the overall mail per-source. I'm thinking of something 
similar in concept to the pct= tag for published policy.


This would reduce the overhead required to report from particular 
sources... But as I'm typing this idea out, this seems less than 
feasible due to the other considerations that come to mind; If a 
receiver designed to report only on 10% of mail received from a 
source, was sent 100 emails from said source, and the 80 of those 
emails of mail were forwards, the feedback would be overwhelmingly 
biased towards forwarding data, and the sender would miss out on 
reports from direct senders and therefore fully compliant (and 
arguably more useful) reports.


Evolving on this thought, if a receiver reported subset percentages of 
all different types of compliant/non-compliant email per-source (SPF 
fails/DKIM passes, SPF passes/DKIM fails... etc, etc.) this might 
provide the data needed while still keeping the reporting volume 
manageable for less internet-scale receivers.


Though, it goes without saying, this type of reporting would be 
woefully inadequate in terms of data availability, and only gives an 
idea of traffic types seen, not inclusive of all-encompassing 
volumetric data that could be derived normally from feedback reporters 
that process all emails.



On 12/8/2022 12:58 AM, Douglas Foster wrote:


1) DMARC was a successful 2-company experiment, which was turned into 
a widely implemented informational RFP.   We are now writing the 
standards-track version of that concept.  We hope that Standards 
Track will provide the basis for significantly increased adoption.  
This seems the appropriate time to ask whether the design can be 
optimized for efficiency.  If you were designing from scratch, would 
this reporting design be the result?   What alternatives have we 
considered and ruled out?


2) The burden of reporting is not experienced equally by all report 
senders.   If I send a batch of messages from 1 source domain to:
- 10 target domains at Google, I will get 1 report, because Google 
consolidates across target domains.
- 10 target domains at Yahoo, I will get 10 reports, because Yahoo 
chooses to disaggregate by target domain.
- 10 target domains to Ironport clients, I will get 20 or 30 
reports.    These are client-specific appliances, many clients have 
multiple appliances configured in parallel for load balancing, and 
each appliance produces its own report.


Google presumably can dedicate servers to the reporting function, 
while the Ironport servers seem to generate reports in parallel with 
message processing.   Altogether, I conclude that Google can absorb 
an increase in workload much more easily than an appliance


3) The burden of reporting is not shared equally at present.  
 Substantially all of my reporting comes from the three sources just 
stated:  Google, Yahoo, and Ironport appliances.   Since these 
organizations have not been actively participating, perhaps you are 
right and they are happy with the present design.   On the other 
hand, perhaps someone with connections should ask them whether they 
want to see optimizations.


4) As DMARC participation grows, the growth curve is not really 
linear.  Currently, 40% of my mailstream is covered by DMARC 
reporting because more than 30% of my outbound mail goes to Google 
servers.   Altogether, the number of reporting domains, from all 
sources, is somewhere around 40.   To move reporting from 40% of 
messages to 40% of domains, the volume of reports will grow by orders 
of magnitude.


5) Which then raises the question of, "Who do we expect to do 
reporting?"    Several participants in this group have expressed the 
conviction that everyone who benefits from DMARC should also 
contribute to DMARC by doing reporting. This seems fair, but it is 
probably not necessary.  Reporting from Google alone is probably 
sufficient for domain owners to know whether or not their servers are 
properly configured.    But as long as we want everyone to 
participate, we cannot assume that everyone will have Google's 
resources to contribute to the reporting task.


All of which says to me that we should be looking to optimize the 
reporting function to minimize the cost of participation.


Doug Foster


On Tue, Dec 6, 2022 at 10:15 PM Seth Blank  wrote:

I'm super unclear what you're talking about.

https://dmarc.org/2022/03/dmarc-policies-up-84-for-2021/

Aggregate reporting is used by the largest volume senders on
earth, and the vast majority of mail received by mailbox
providers comes with a dmarc record and reporting address 

Re: [dmarc-ietf] Does Aggregate Reporting meet "Internet Scale" test?

2022-12-08 Thread Mark Alley
This may have been thought of before, so forgive the potentially 
duplicate idea, I was musing earlier about feedback reporting based on a 
percent of the overall mail per-source. I'm thinking of something 
similar in concept to the pct= tag for published policy.


This would reduce the overhead required to report from particular 
sources... But as I'm typing this idea out, this seems less than 
feasible due to the other considerations that come to mind; If a 
receiver designed to report only on 10% of mail received from a source, 
was sent 100 emails from said source, and the 80 of those emails of mail 
were forwards, the feedback would be overwhelmingly biased towards 
forwarding data, and the sender would miss out on reports from direct 
senders and therefore fully compliant (and arguably more useful) reports.


Evolving on this thought, if a receiver reported subset percentages of 
all different types of compliant/non-compliant email per-source (SPF 
fails/DKIM passes, SPF passes/DKIM fails... etc, etc.) this might 
provide the data needed while still keeping the reporting volume 
manageable for less internet-scale receivers.


Though, it goes without saying, this type of reporting would be woefully 
inadequate in terms of data availability, and only gives an idea of 
traffic types seen, not inclusive of all-encompassing volumetric data 
that could be derived normally from feedback reporters that process all 
emails.



On 12/8/2022 12:58 AM, Douglas Foster wrote:


1) DMARC was a successful 2-company experiment, which was turned into 
a widely implemented informational RFP.   We are now writing the 
standards-track version of that concept.  We hope that Standards Track 
will provide the basis for significantly increased adoption.  This 
seems the appropriate time to ask whether the design can be optimized 
for efficiency.  If you were designing from scratch, would this 
reporting design be the result?   What alternatives have we considered 
and ruled out?


2) The burden of reporting is not experienced equally by all report 
senders.   If I send a batch of messages from 1 source domain to:
- 10 target domains at Google, I will get 1 report, because Google 
consolidates across target domains.
- 10 target domains at Yahoo, I will get 10 reports, because Yahoo 
chooses to disaggregate by target domain.
- 10 target domains to Ironport clients, I will get 20 or 30 reports.  
  These are client-specific appliances, many clients have multiple 
appliances configured in parallel for load balancing, and each 
appliance produces its own report.


Google presumably can dedicate servers to the reporting function, 
while the Ironport servers seem to generate reports in parallel with 
message processing.   Altogether, I conclude that Google can absorb an 
increase in workload much more easily than an appliance


3) The burden of reporting is not shared equally at present.  
 Substantially all of my reporting comes from the three sources just 
stated:  Google, Yahoo, and Ironport appliances.   Since these 
organizations have not been actively participating, perhaps you are 
right and they are happy with the present design.   On the other hand, 
perhaps someone with connections should ask them whether they want to 
see optimizations.


4) As DMARC participation grows, the growth curve is not really 
linear.  Currently, 40% of my mailstream is covered by DMARC reporting 
because more than 30% of my outbound mail goes to Google servers.  
 Altogether, the number of reporting domains, from all sources, is 
somewhere around 40.   To move reporting from 40% of messages to 40% 
of domains, the volume of reports will grow by orders of magnitude.


5) Which then raises the question of, "Who do we expect to do 
reporting?"    Several participants in this group have expressed the 
conviction that everyone who benefits from DMARC should also 
contribute to DMARC by doing reporting.    This seems fair, but it is 
probably not necessary.   Reporting from Google alone is probably 
sufficient for domain owners to know whether or not their servers are 
properly configured.    But as long as we want everyone to 
participate, we cannot assume that everyone will have Google's 
resources to contribute to the reporting task.


All of which says to me that we should be looking to optimize the 
reporting function to minimize the cost of participation.


Doug Foster


On Tue, Dec 6, 2022 at 10:15 PM Seth Blank  wrote:

I'm super unclear what you're talking about.

https://dmarc.org/2022/03/dmarc-policies-up-84-for-2021/

Aggregate reporting is used by the largest volume senders on
earth, and the vast majority of mail received by mailbox providers
comes with a dmarc record and reporting address attached.

This is umpteen billions of messages a day that get aggregated
into reports.

What are you getting at? That seems pretty internet scale to me...

Seth

On Mon, Dec 5, 2022 at 2:01 PM Douglas Foster