On 25/01/16 08:57, Dave Warren wrote:
> Bayes is good at categorizing mail, but I don't think "Trying to sell
> something" is necessarily even a spam-sign, lots of legitimate and
> desired mail is trying to sell me something too. At the same time,
> everything I've read about this new method seems to be a slightly
> modified bayes approach (with the twist of taking word pairs or triplets
> into account) and I doubt it will be a real game changer, although it
> may result in some new ways to tune bayes to increase effectiveness.

There's nothing new about the twist - They're called Hapax legomenon,
and it's been built into Spam Assassin for a while - earliest quick
reference I can see is 2007. It's enabled by default. DSPAM also
includes this ability. Token combinations (2-3 word hapax) are also an
option for some program out there, but the instance eludes me at
present. This is probably why no one is jumping up and down with joy at
this FUSSP - we're all already using it.

http://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html
> bayes_use_hapaxes     (default: 1)
> Should the Bayesian classifier use hapaxes (words/tokens that occur only 
> once) when classifying? This produces significantly better hit-rates.



_______________________________________________
mailop mailing list
mailop@mailop.org
https://chilli.nosignal.org/cgi-bin/mailman/listinfo/mailop

Reply via email to