Please folks, don't take my comments as aggressive, even though they may sometimes come across as cynical. My top priority is to somehow help curb the flood of spam, not to accuse people of not doing enough or not doing the right thing.
Am 05.02.21 um 04:23 schrieb Brandon Long via mailop: > If you received say... a million ab...@gmail.com <mailto:ab...@gmail.com> > emails a day, how would you handle that? > When you have technology to handle some billion searches per day, I would suggest you have some experts in house who could come up with ideas :-) > Now, automating abuse@ requests is more likely at that scale... trying to > find common issues and problems, even across > days, or maybe learning certain reporters as more useful than others... and > the reported issues from that are still a > drop in the bucket compared to the known spammy accounts issues you get from > other sources. > > Which isn't to say it's not useful, it is, it does find this weird low level > of abuse that tends to be continuous but > otherwise below the major campaigns you're already catching and working on... > on the other hand, ignoring it for too > long and you can get a large amount of ground level abuse noise that is made > up of individually small actors. Or > maybe you're completely missing some new type of spam which is evading your > other feedback mechanisms. Automation is of course the key at that scale. It is absolutely clear that it's impossible to handle abuse reports for that many users manually, even if you staff your abuse desk quite generously. And automation isn't easy. I've dabbled a bit with Tensorflow to identify likely spam sources, and getting low enough false positives while being able to adapt to changing spammer patterns proved to be beyond my reach as a spare time mail admin. But we're not talking about some arbitrary spare-time mailbox provider, we're talking about a company that spends millions (actually, billions: https://www.quora.com/Which-big-tech-company-spends-the-most-R-D-on-AI-machine-learning) on AI algorithms that enable their users to find images of kittens on the internet, or select ads for me to show offers for the bass guitar that I bought some years ago just because I recently visited a youtube video of some review of this bass (true story). There's a lot that can be achieved with automated categorization of spam reports according to different dimensions: * Reliability: Reliable spam reporters vs. people who hit the Junk button accidentally. * Own resources used to spam: Sender and Reply-To: mailbox addresses, IP addresses, message IDs, cloud facilities etc. can be extracted from mail headers and (with some less accuracy) mail bodies. * Content: 4-1-9 scams, link shortener URLs stuffed with spammer prose, promises of financial or sexual performance enhancements, possible malware, etc. Combined these could sort reports into a few hundred "bins" labeled with report reliability, actionability, severity. Acting on the top one or two dozen actionable bins each day would go a long way. * If you ignore the huge "nonsense" bin you might miss a few real abuse reports but save a lot of work. * The 4-1-9 bin is probably best handled automatically by disabling sender and reply-to addresses reported by reliable sources. If possible, a distinction between accounts created to spam and compromised accounts based on account creation and usage data should be made here to determine the best action. * Accounts sending malware and URL shortener stuff are almost always compromised. You need a policy to handle such accounts (restrict their sending ability until cleaned up) but if you have such a policy and mechanism to mark accounts as compromised that should be doable with a few clicks. * Abuse performed using cloud services such as the google form facility probably requires policy changes which is beyond the scope of the abuse desk but should be triggered by them. * Bulk mail sending customers who create significant amounts of complaints need to be scrutinized to determine whether you can make them change their behavior or a termination is warranted. This is most likely the most time-consuming activity and it can't be automated, but decision-making in this area can be supported by good data. That's probably something a single abuse desk worker or small team with appropriate tools and authority to disable accounts should be able to manage. Of course these are just brain farts, not a well-thought-out plan. Maybe you even do something comparable to this, I don't know. Some externally visible signs make me question it: * Reporting channel: Apparently e-mailed abuse reports are not accepted anymore (that's at least what Spamcop says, I have since stopped reporting spam to Google using SC). I wasn't able to find a spam reporting option where I could post a complete e-mail header. The whole story is too long to fit into this bullet point. But it looks like there's no way for me to get reliable spam data into the abuse system, and without such data I don't see how any of this could work. * Ongoing spam from reported sources: As Jesper pointed out earlier in this thread, spam from specific sources goes on for months even though properly reported. I've made the same experience with the google forms based spam (trix.bounces.google.com), which is still ongoing. Cheers, Hans-Martin
_______________________________________________ mailop mailing list mailop@mailop.org https://list.mailop.org/listinfo/mailop