Re: [mailop] New method of blocking spam
On Fri, Jan 22, 2016 at 3:23 AM, Michelle Sullivanwrote: > If you're doing it just on the subject, ok I'll go with that.. There's an MSc Thesis by Chris Kopsidas (then a student at the University of Athens, back in 2012) where we worked explicitly on subject lines of spams that went past SpamAssassin, RBLs and a few other filters. I thought at the time that since a Subject line is considerably smaller than most message bodies, trying to infer spam or ham based on the subject would be faster than checking the whole message. I never really got it to production since I had more pressing problems to deal with, but if anyone is interested, I can put you in contact with both the guy that implemented the idea and his (then) supervisor. -- "If technology is your thing plan to die reading manuals" --Gene Woolsey ___ mailop mailing list mailop@mailop.org https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop
Re: [mailop] New method of blocking spam
On 25/01/16 08:57, Dave Warren wrote: > Bayes is good at categorizing mail, but I don't think "Trying to sell > something" is necessarily even a spam-sign, lots of legitimate and > desired mail is trying to sell me something too. At the same time, > everything I've read about this new method seems to be a slightly > modified bayes approach (with the twist of taking word pairs or triplets > into account) and I doubt it will be a real game changer, although it > may result in some new ways to tune bayes to increase effectiveness. There's nothing new about the twist - They're called Hapax legomenon, and it's been built into Spam Assassin for a while - earliest quick reference I can see is 2007. It's enabled by default. DSPAM also includes this ability. Token combinations (2-3 word hapax) are also an option for some program out there, but the instance eludes me at present. This is probably why no one is jumping up and down with joy at this FUSSP - we're all already using it. http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html > bayes_use_hapaxes (default: 1) > Should the Bayesian classifier use hapaxes (words/tokens that occur only > once) when classifying? This produces significantly better hit-rates. ___ mailop mailing list mailop@mailop.org https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop
Re: [mailop] New method of blocking spam
>While all of that is true, IF his claims were true (an idea could >magically detect any spam trying to sell you something) would you walk >away from a magic pill that completely and perfectly identified one >particular type of spam and didn't hit any ham? Yeah, because the next day the spammers would figure out how to circumvent it. >modified bayes approach (with the twist of taking word pairs or triplets >into account) and I doubt it will be a real game changer, although it >may result in some new ways to tune bayes to increase effectiveness. There's nothing new about looking at multiple words. Check out my Twitter feed at https://twitter.com/svictest which estimates probablilites of four-word phrases in a bunch of RSS feeds I follow and uses them to come up with, ah, oracular statements. R's, John ___ mailop mailing list mailop@mailop.org https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop
Re: [mailop] ping DNSBLs
On 2016-01-23 07:35, John Levine wrote: RFC 5782 says that a live DNSxL does list 127.0.0.2 to show that it's alive, and does not list 127.0.0.1 to show that it's not wildcarded. We published that in 2010 but it was in draft form for quite a while before that. For IPv6 BLs, you list :::127.0.0.2 and don't list :::127.0.0.1. For name BLs, you list TEST and don't list INVALID. You can't make everyone follow the rules, but I have to say that it's been a while since I've seen a BL that I care about that doesn't. And conversely, if a DNSBL can't be bothered to follow simple standards or doesn't have the technical competence to avoid listing 127.0.0.1, is it worth caring about? If a DNSBL lists an IP in a forest and nobody ever queries it, does anyone but NANAE care? -- Dave Warren http://www.hireahit.com/ http://ca.linkedin.com/in/davejwarren ___ mailop mailing list mailop@mailop.org https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop
Re: [mailop] New method of blocking spam
On 2016-01-22 19:24, John R Levine wrote: What get's spammers caught is that eventually they have to sell you something Gee, did we drop through a wormhole into 1998 or something? He's missing a few somethings. Spammers might not be trying to sell you something. No kidding. The classic example is pump and dump, where they're trying to get you to call your own stockbroker to buy the stock they're touting, with no direct contact at all with the spammer. Even with stuff like drug spam, the number of throwaway domains and redirections between the spam and the payload site is likely to be somewhat higher than someone might expect. A *lot* higher. While all of that is true, IF his claims were true (an idea could magically detect any spam trying to sell you something) would you walk away from a magic pill that completely and perfectly identified one particular type of spam and didn't hit any ham? I don't think that this solution is that, but spam filtering has always been about multiple layers and approaches, some of which will excel for different types of spam, and combining the results of multiple filters and rulesets has, in my experience, always worked better than any one single approach. Bayes is good at categorizing mail, but I don't think "Trying to sell something" is necessarily even a spam-sign, lots of legitimate and desired mail is trying to sell me something too. At the same time, everything I've read about this new method seems to be a slightly modified bayes approach (with the twist of taking word pairs or triplets into account) and I doubt it will be a real game changer, although it may result in some new ways to tune bayes to increase effectiveness. -- Dave Warren http://www.hireahit.com/ http://ca.linkedin.com/in/davejwarren ___ mailop mailing list mailop@mailop.org https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop