On Fri, 8 Jun 2018 at 13:57, David Hofstee <opentext.dhofs...@gmail.com> wrote:
> Hi Stefano,
>
> The only problem I see with Cloudmark is that they are not just a reputation 
> provider, but a spamfilter provider with access to all the data. Which has 
> been acquired by Proofpoint.

Well it is a mix of reputation and filter.. the filtering is "ON/OFF"
and not reputation/score based. If one fingerprint is in the blocklist
the email is spam, otherwise is ham.
The "reputation" is involved in the way they take into consideration
the reporters or they decide to block or unblock a give fingerprint.
So it is not so different from an IP RBL.

What do you mean with "spamfilter provider with access to all the data."?
They don't see the whole email flow.. they mostly get fingerprint
lists for email being marked as spam or "non-spam" and an installable
peace of software (that you can add to your mailserver) do the
fingerprint extraction and is able to check if any fingerprint is
currently blocked, and then you can do whatever you want with this
information.

An email will result in 5, 10, 30 fingerprints depending on the number
of "intesting" things Cloudmark identify.

> I'm asking myself the question if the fingerprints they collect are GDPR 
> proof (although Jaren may comment on that).

We'll probably know this in a few years , as GDPR will start getting
used for real...
In the mean time they mostly deal with cryptographic irreversible
hashes of the email "fingerprints". GDPR call this
"pseudo-anonymization" because you if know the original personal data
you can find the fingerprint for it.

For example whenever an email contains a link or plaintext sequence
that sounds like an url and the domain is "gravatar.com" they will ad
a "AxTCswu-AAAA:8" fingerprint for that domain. This "AxTCswu-AAAA:8"
is the one that is blocked if they want to start blocking every email
referencing gravatar and the "block packages" their users download via
DNS micro-updates will simply contain this fingerprint as the
filtering can be done "offline" once you have the latest updates.
There are specific domains where the hash will be computed for every
subdomain and there are specific domains where another hash will be
computed depending on the path, e.g:
gallery.mailchimp.com, googleusercontent.com, www.youtube.com,
www.dropbox.com, goo.gl, tinypic.com and many other will generate a
"something:20" fingerprint that will be specific to a part of the
path. For url shortner is the full url, for others like
gallery.mailchimp it will include the "user specific" part of the url
so that they can identify a single mailchimp sender with a specific
fingerprint.
This "special" domain list is part of the "data", not of the
code/algorythm... When you receive the micro-updates with the
blacklisted fingerprints you may also receive updates about the
patterns that will generate the :20 fingerprints.

This is just a basic case of the most common patterns from CloudMark
Authority, but there is a lot more and their system is really
interesting: it is the most advanced of its kind. I think it is much
better than using a bunch of IP/URI BL at the SMTP level, but it is
far from perfect. E.g: given they don't have "volume" informations
they very easily block "new" fingerprints because of a couple of "mark
as spam" actions even if there were 2 "mark as spam" for 1 million
email delivered... then, you'll start being junked and they easily
will get "non-spam" reports and remove the fingerprint from the
block... I see a lot of new senders blocked at least once in their
first send and then unblocked after a couple of "junked" emails. Of
course this only works if you send blocked email to spam and you get
the "not-spam" reports.

IMHO this is GDPR compliant, not so different from IP RBL: they deal
with clear-text IP addresses and IP addresses are not even
pseudo-anonymized.
That said GPDR in the end doesn't make much differences between
personal data and pseudo-anonymized personal data... they both require
the same "legal" treatment, but if you use pseudo-anonymization you
can say you care more about the privacy leaks issues.

Stefano

>
> Yours,
>
>
> David
>
> On 8 June 2018 at 12:35, Stefano Bagnara <mai...@bago.org> wrote:
>>
>> On Fri, 8 Jun 2018 at 11:53, David Hofstee <opentext.dhofs...@gmail.com> 
>> wrote:
>> > [...]
>> > I also think that there is space for a reputation provider which can:
>> > - Identify more than just IP addresses and domains from an email.
>>
>> This is what CloudMark Authority does about this, but you enable a new
>> set of issues that have been just fixed in the IP/domain world thanks
>> to DMARC (I wrote an answer to the SDLU sister-thread).
>> IIRC Vipul's Razor used the same fingerprinting concepts and ended up
>> using a DNSBL of "fingerprints". Vipul founded Cloudmark and I don't
>> know what is the current status of the Razor project.
>>
>> > - Is able to process feedback from domain owners and recipients in an 
>> > automated, quick, effective and anonymous enough way (with the GDPR et 
>> > al). Feedback is key.
>>
>> I strongly agree with "Feedback is key" both for "spam reports" and
>> "non-spam reports" (and considering that "non-spam" only flows if you
>> didn't block at the SMTP level).
>> Unfortunately once you adopt SMTP reject based on a blacklist then you
>> accept to stop getting false positives about that traffic, so you
>> really stop monitoring the effectiveness of that block.
>>
>> The issue with this is that you have to start building a reputation
>> not only for IP/domains/other email content fingerprints (sender
>> stuff), but also need to build a reputation for feedback providers
>> (recipient stuff) and maybe you also need a reputation management for
>> people asking delisting (consultant, ESP...)
>>
>> A lot of data to mix.. so you start building some machine learning to
>> deal with that automatically and then you end up with SmartScreen and
>> no one, included your creator, will know why some message have been
>> blocked or not ;-)   (no offense to SmartScreen, I know how hard is to
>> deal with this stuff, but I receive Office365 invoices in spam, in my
>> Office365 inbox.).
>>
>> An FBL system "ala Google PT" (so only aggregated data not enabling
>> list washing) but open to multiple receivers could help adding more
>> accountability to ESP to do their own filtering part (mainly with B2B
>> emails, as with B2C they already have microsoft/yahoo/google data).
>>
>> Stefano
>>
>> _______________________________________________
>> mailop mailing list
>> mailop@mailop.org
>> https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop
>
>
>
>
> --
> --
> My opinion is mine.

_______________________________________________
mailop mailing list
mailop@mailop.org
https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop

Reply via email to