On Fri, 8 Jun 2018 at 13:57, David Hofstee <opentext.dhofs...@gmail.com> wrote: > Hi Stefano, > > The only problem I see with Cloudmark is that they are not just a reputation > provider, but a spamfilter provider with access to all the data. Which has > been acquired by Proofpoint.
Well it is a mix of reputation and filter.. the filtering is "ON/OFF" and not reputation/score based. If one fingerprint is in the blocklist the email is spam, otherwise is ham. The "reputation" is involved in the way they take into consideration the reporters or they decide to block or unblock a give fingerprint. So it is not so different from an IP RBL. What do you mean with "spamfilter provider with access to all the data."? They don't see the whole email flow.. they mostly get fingerprint lists for email being marked as spam or "non-spam" and an installable peace of software (that you can add to your mailserver) do the fingerprint extraction and is able to check if any fingerprint is currently blocked, and then you can do whatever you want with this information. An email will result in 5, 10, 30 fingerprints depending on the number of "intesting" things Cloudmark identify. > I'm asking myself the question if the fingerprints they collect are GDPR > proof (although Jaren may comment on that). We'll probably know this in a few years , as GDPR will start getting used for real... In the mean time they mostly deal with cryptographic irreversible hashes of the email "fingerprints". GDPR call this "pseudo-anonymization" because you if know the original personal data you can find the fingerprint for it. For example whenever an email contains a link or plaintext sequence that sounds like an url and the domain is "gravatar.com" they will ad a "AxTCswu-AAAA:8" fingerprint for that domain. This "AxTCswu-AAAA:8" is the one that is blocked if they want to start blocking every email referencing gravatar and the "block packages" their users download via DNS micro-updates will simply contain this fingerprint as the filtering can be done "offline" once you have the latest updates. There are specific domains where the hash will be computed for every subdomain and there are specific domains where another hash will be computed depending on the path, e.g: gallery.mailchimp.com, googleusercontent.com, www.youtube.com, www.dropbox.com, goo.gl, tinypic.com and many other will generate a "something:20" fingerprint that will be specific to a part of the path. For url shortner is the full url, for others like gallery.mailchimp it will include the "user specific" part of the url so that they can identify a single mailchimp sender with a specific fingerprint. This "special" domain list is part of the "data", not of the code/algorythm... When you receive the micro-updates with the blacklisted fingerprints you may also receive updates about the patterns that will generate the :20 fingerprints. This is just a basic case of the most common patterns from CloudMark Authority, but there is a lot more and their system is really interesting: it is the most advanced of its kind. I think it is much better than using a bunch of IP/URI BL at the SMTP level, but it is far from perfect. E.g: given they don't have "volume" informations they very easily block "new" fingerprints because of a couple of "mark as spam" actions even if there were 2 "mark as spam" for 1 million email delivered... then, you'll start being junked and they easily will get "non-spam" reports and remove the fingerprint from the block... I see a lot of new senders blocked at least once in their first send and then unblocked after a couple of "junked" emails. Of course this only works if you send blocked email to spam and you get the "not-spam" reports. IMHO this is GDPR compliant, not so different from IP RBL: they deal with clear-text IP addresses and IP addresses are not even pseudo-anonymized. That said GPDR in the end doesn't make much differences between personal data and pseudo-anonymized personal data... they both require the same "legal" treatment, but if you use pseudo-anonymization you can say you care more about the privacy leaks issues. Stefano > > Yours, > > > David > > On 8 June 2018 at 12:35, Stefano Bagnara <mai...@bago.org> wrote: >> >> On Fri, 8 Jun 2018 at 11:53, David Hofstee <opentext.dhofs...@gmail.com> >> wrote: >> > [...] >> > I also think that there is space for a reputation provider which can: >> > - Identify more than just IP addresses and domains from an email. >> >> This is what CloudMark Authority does about this, but you enable a new >> set of issues that have been just fixed in the IP/domain world thanks >> to DMARC (I wrote an answer to the SDLU sister-thread). >> IIRC Vipul's Razor used the same fingerprinting concepts and ended up >> using a DNSBL of "fingerprints". Vipul founded Cloudmark and I don't >> know what is the current status of the Razor project. >> >> > - Is able to process feedback from domain owners and recipients in an >> > automated, quick, effective and anonymous enough way (with the GDPR et >> > al). Feedback is key. >> >> I strongly agree with "Feedback is key" both for "spam reports" and >> "non-spam reports" (and considering that "non-spam" only flows if you >> didn't block at the SMTP level). >> Unfortunately once you adopt SMTP reject based on a blacklist then you >> accept to stop getting false positives about that traffic, so you >> really stop monitoring the effectiveness of that block. >> >> The issue with this is that you have to start building a reputation >> not only for IP/domains/other email content fingerprints (sender >> stuff), but also need to build a reputation for feedback providers >> (recipient stuff) and maybe you also need a reputation management for >> people asking delisting (consultant, ESP...) >> >> A lot of data to mix.. so you start building some machine learning to >> deal with that automatically and then you end up with SmartScreen and >> no one, included your creator, will know why some message have been >> blocked or not ;-) (no offense to SmartScreen, I know how hard is to >> deal with this stuff, but I receive Office365 invoices in spam, in my >> Office365 inbox.). >> >> An FBL system "ala Google PT" (so only aggregated data not enabling >> list washing) but open to multiple receivers could help adding more >> accountability to ESP to do their own filtering part (mainly with B2B >> emails, as with B2C they already have microsoft/yahoo/google data). >> >> Stefano >> >> _______________________________________________ >> mailop mailing list >> mailop@mailop.org >> https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop > > > > > -- > -- > My opinion is mine. _______________________________________________ mailop mailing list mailop@mailop.org https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop