Re: [dmarc-discuss] DMARC Reporting De-duplication

2018-05-05 Thread John Levine via dmarc-discuss
In article <1675430.NNnUSil6oV@kitterma-e6430> you write:
>As an example, I have been able to find four messages I sent to 
>lists.debian.org email lists on April 30th.  The volume reported for that 
>source for that day from various feedback reporters was 2,436.  This makes it 
>a little hard to consume the feedback.

My feedback goes into a database where I do occasional summary
queries.  I don't recall any particular problems doing the analysis
and it is kind of fun to extract numbers like how many NANOG
subscribers get their mail at Gmail.

If a future DMARC 1.01 had deduped reports, some would be deduped,
some wouldn't, and it'd be if anything harder to find the signal
in the noise.

R's,
John
___
dmarc-discuss mailing list
dmarc-discuss@dmarc.org
http://www.dmarc.org/mailman/listinfo/dmarc-discuss

NOTE: Participating in this list means you agree to the DMARC Note Well terms 
(http://www.dmarc.org/note_well.html)


Re: [dmarc-discuss] DMARC Reporting De-duplication

2018-05-05 Thread Alessandro Vesely via dmarc-discuss
On Fri 04/May/2018 21:37:35 +0200 Scott Kitterman via dmarc-discuss wrote: 
> 
> Shouldn't it be possible to de-duplicate these based on message ID *before* 
> sending aggregate reports back?  Can/should this be added to DMARC the next 
> time the specification is updated?  [my emphasis]

The "before" I emphasized above suggests that the message-id won't make it into
the final report, doing so would seriously inflate the report itself.  However,
in order to envisage the effect, let's suppose message-id is part of a draft
report at the sender, for example right after the count field in a tabular 
view[*]:

Source IP   CountMessage-idDisposition  SPF ...
192.0.2.1  12blob1@domain none  ✗ Fail  ...
192.0.2.2   1blob1@domain none  ✓ Pass  ...
192.0.2.3   1blob2@domain none  ✓ Pass  ...
...

Assuming message-id's are reliable, that table shows two messages, one of which
was received 13 times.  The second message was received just once, but that
doesn't mean it had a single recipient, does it?  So, if the multi-destination
delivery of a single message results from an expansion (a.k.a. explosion)
performed by external relays, the count is going to be higher than for
expansions performed internally.  Your proposal is to substitute "12" with "1"
in the draft report, and then cut the message-id, group by source IP, From:
domain (not showed), and results while counting just the rows, correct?

That technique wouldn't fully eliminate the inconsistency, because equivalent
copies of a message may come from different sources.  Thinking of the tricks
MTAs deploy to break long recipient lists into multiple messages with shorter
list sizes, possibly relayed by different mailouts, tells me that the count
field cannot be precise in any case.  It is a rough estimate of the results'
impact.  In that sense, "12" tells you that the SPF failures exemplified above
are more important than the two passes, in case you were thinking about
hardening your policy.  I'd keep it as is.

jm2c
Ale
-- 

[*] https://en.wikipedia.org/wiki/DMARC#Aggregate_reports
___
dmarc-discuss mailing list
dmarc-discuss@dmarc.org
http://www.dmarc.org/mailman/listinfo/dmarc-discuss

NOTE: Participating in this list means you agree to the DMARC Note Well terms 
(http://www.dmarc.org/note_well.html)

Re: [dmarc-discuss] DMARC Reporting De-duplication

2018-05-04 Thread Steven M Jones via dmarc-discuss

On 05/04/2018 12:37 PM, Scott Kitterman via dmarc-discuss wrote:

I participate in a lot of mailing lists many of which that have a large number
of subscribers.  ...

Shouldn't it be possible to de-duplicate these based on message ID before
sending aggregate reports back?  Can/should this be added to DMARC the next
time the specification is updated?


There may be interesting anti-abuse cases that justify storing this kind 
of information in a readily accessible form for e.g. de-duplication, 
versus a static log file.


But even if receivers / mailbox providers are already doing that, 
where's their incentive for the reporting change you describe? What 
would be the resulting improvement in the quality of mailstreams sent 
using a given domain? The reduction in customer support, or increase in 
customer satisfaction, of the kind they purportedly see when it's easier 
to detect fraudulent messages?


I can understand how the reporting change you suggest *might* be useful 
to the individual sender, where the sender and the domain operator total 
1 or 2. Can you help us understand what's in it for the other parties 
involved? And how does it help in the more typical case where there are 
between dozens and thousands of users of the domain?


--S.

___
dmarc-discuss mailing list
dmarc-discuss@dmarc.org
http://www.dmarc.org/mailman/listinfo/dmarc-discuss

NOTE: Participating in this list means you agree to the DMARC Note Well terms 
(http://www.dmarc.org/note_well.html)


Re: [dmarc-discuss] DMARC Reporting De-duplication

2018-05-04 Thread Roland Turner via dmarc-discuss

Would this really help?

You haven't explained what you mean by "a little hard to consume". On 
the face of it, it's just an integer; a 1 is no easier to perform 
arithmetic on than a 2,436. If what you mean is that it's difficult to 
make meaningful comparison between the number that you send and the 
number reported received then that's true of course, but that would 
remain the case as you'd have no way to work out which of the counts in 
the individual receiver reports included messages that had been 
processed by mailing lists; you'd replace your 2,436 with an 
indeterminate number between 1 and, say, 100.


- Roland


On 05/05/18 03:37, Scott Kitterman via dmarc-discuss wrote:

I participate in a lot of mailing lists many of which that have a large number
of subscribers.  As a result, when I send a single message to a mailing list,
many copies of the same message get sent to users at large mail providers.
These get counted as individual messages in aggregate reporting.

As an example, I have been able to find four messages I sent to
lists.debian.org email lists on April 30th.  The volume reported for that
source for that day from various feedback reporters was 2,436.  This makes it
a little hard to consume the feedback.

Shouldn't it be possible to de-duplicate these based on message ID before
sending aggregate reports back?  Can/should this be added to DMARC the next
time the specification is updated?

Scott K
___
dmarc-discuss mailing list
dmarc-discuss@dmarc.org
http://www.dmarc.org/mailman/listinfo/dmarc-discuss

NOTE: Participating in this list means you agree to the DMARC Note Well terms 
(http://www.dmarc.org/note_well.html)



___
dmarc-discuss mailing list
dmarc-discuss@dmarc.org
http://www.dmarc.org/mailman/listinfo/dmarc-discuss

NOTE: Participating in this list means you agree to the DMARC Note Well terms 
(http://www.dmarc.org/note_well.html)


[dmarc-discuss] DMARC Reporting De-duplication

2018-05-04 Thread Scott Kitterman via dmarc-discuss
I participate in a lot of mailing lists many of which that have a large number 
of subscribers.  As a result, when I send a single message to a mailing list, 
many copies of the same message get sent to users at large mail providers.  
These get counted as individual messages in aggregate reporting.

As an example, I have been able to find four messages I sent to 
lists.debian.org email lists on April 30th.  The volume reported for that 
source for that day from various feedback reporters was 2,436.  This makes it 
a little hard to consume the feedback.

Shouldn't it be possible to de-duplicate these based on message ID before 
sending aggregate reports back?  Can/should this be added to DMARC the next 
time the specification is updated?

Scott K
___
dmarc-discuss mailing list
dmarc-discuss@dmarc.org
http://www.dmarc.org/mailman/listinfo/dmarc-discuss

NOTE: Participating in this list means you agree to the DMARC Note Well terms 
(http://www.dmarc.org/note_well.html)