Re: [mailop] New method of blocking spam

2016-01-24 Thread Yiorgos Adamopoulos
On Fri, Jan 22, 2016 at 3:23 AM, Michelle Sullivan  wrote:
> If you're doing it just on the subject, ok I'll go with that..

There's an MSc Thesis by Chris Kopsidas (then a student at the
University of Athens, back in 2012) where we worked explicitly on
subject lines of spams that went past SpamAssassin, RBLs and a few
other filters. I thought at the time that since a Subject line is
considerably smaller than most message bodies, trying to infer spam or
ham based on the subject would be faster than checking the whole
message.

I never really got it to production since I had more pressing problems
to deal with, but if anyone is interested, I can put you in contact
with both the guy that implemented the idea and his (then) supervisor.

-- 
"If technology is your thing plan to die reading manuals" --Gene Woolsey

___
mailop mailing list
mailop@mailop.org
https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop


Re: [mailop] New method of blocking spam

2016-01-24 Thread Ted Cooper
On 25/01/16 08:57, Dave Warren wrote:
> Bayes is good at categorizing mail, but I don't think "Trying to sell
> something" is necessarily even a spam-sign, lots of legitimate and
> desired mail is trying to sell me something too. At the same time,
> everything I've read about this new method seems to be a slightly
> modified bayes approach (with the twist of taking word pairs or triplets
> into account) and I doubt it will be a real game changer, although it
> may result in some new ways to tune bayes to increase effectiveness.

There's nothing new about the twist - They're called Hapax legomenon,
and it's been built into Spam Assassin for a while - earliest quick
reference I can see is 2007. It's enabled by default. DSPAM also
includes this ability. Token combinations (2-3 word hapax) are also an
option for some program out there, but the instance eludes me at
present. This is probably why no one is jumping up and down with joy at
this FUSSP - we're all already using it.

http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html
> bayes_use_hapaxes (default: 1)
> Should the Bayesian classifier use hapaxes (words/tokens that occur only 
> once) when classifying? This produces significantly better hit-rates.



___
mailop mailing list
mailop@mailop.org
https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop


Re: [mailop] New method of blocking spam

2016-01-24 Thread John Levine
>While all of that is true, IF his claims were true (an idea could 
>magically detect any spam trying to sell you something) would you walk 
>away from a magic pill that completely and perfectly identified one 
>particular type of spam and didn't hit any ham?

Yeah, because the next day the spammers would figure out how to
circumvent it.

>modified bayes approach (with the twist of taking word pairs or triplets 
>into account) and I doubt it will be a real game changer, although it 
>may result in some new ways to tune bayes to increase effectiveness.

There's nothing new about looking at multiple words.  Check out my
Twitter feed at https://twitter.com/svictest which estimates
probablilites of four-word phrases in a bunch of RSS feeds I follow
and uses them to come up with, ah, oracular statements.

R's,
John

___
mailop mailing list
mailop@mailop.org
https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop


Re: [mailop] ping DNSBLs

2016-01-24 Thread Dave Warren

On 2016-01-23 07:35, John Levine wrote:

RFC 5782 says that a live DNSxL does list 127.0.0.2 to show that it's
alive, and does not list 127.0.0.1 to show that it's not wildcarded.
We published that in 2010 but it was in draft form for quite a while
before that.  For IPv6 BLs, you list :::127.0.0.2 and don't list
:::127.0.0.1.  For name BLs, you list TEST and don't list INVALID.
You can't make everyone follow the rules, but I have to say that it's 
been a while since I've seen a BL that I care about that doesn't.


And conversely, if a DNSBL can't be bothered to follow simple standards 
or doesn't have the technical competence to avoid listing 127.0.0.1, is 
it worth caring about?


If a DNSBL lists an IP in a forest and nobody ever queries it, does 
anyone but NANAE care?


--
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren



___
mailop mailing list
mailop@mailop.org
https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop


Re: [mailop] New method of blocking spam

2016-01-24 Thread Dave Warren

On 2016-01-22 19:24, John R Levine wrote:

What get's spammers caught is that eventually they
have to sell you something


Gee, did we drop through a wormhole into 1998 or something?


He's missing a few somethings.
Spammers might not be trying to sell you something.


No kidding.  The classic example is pump and dump, where they're 
trying to get you to call your own stockbroker to buy the stock 
they're touting, with no direct contact at all with the spammer.


Even with stuff like drug spam, the number of throwaway domains and 
redirections between the spam and the payload site is likely to be 
somewhat higher than someone might expect.  A *lot* higher.


While all of that is true, IF his claims were true (an idea could 
magically detect any spam trying to sell you something) would you walk 
away from a magic pill that completely and perfectly identified one 
particular type of spam and didn't hit any ham?


I don't think that this solution is that, but spam filtering has always 
been about multiple layers and approaches, some of which will excel for 
different types of spam, and combining the results of multiple filters 
and rulesets has, in my experience, always worked better than any one 
single approach.


Bayes is good at categorizing mail, but I don't think "Trying to sell 
something" is necessarily even a spam-sign, lots of legitimate and 
desired mail is trying to sell me something too. At the same time, 
everything I've read about this new method seems to be a slightly 
modified bayes approach (with the twist of taking word pairs or triplets 
into account) and I doubt it will be a real game changer, although it 
may result in some new ways to tune bayes to increase effectiveness.



--
Dave Warren
http://www.hireahit.com/
http://ca.linkedin.com/in/davejwarren



___
mailop mailing list
mailop@mailop.org
https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop