Re: random low contrast text with bayes [Solved]

2014-10-04 Thread Eric Shubert

On 09/03/2014 01:26 AM, Matus UHLAR - fantomas wrote:

On Sun, 31 Aug 2014, Eric Shubert wrote:

I've seen an uptick of spam lately with random low contrast (hidden)
text. This appears to be lowering bayes probabilities.



On 08/31/2014 10:26 PM, John Hardin wrote:

Learn them as spam. That will tend to eliminate that effect.


On 31.08.14 22:54, Eric Shubert wrote:

Been doing that (learning them) for quite a while. I've had that
mechanism set up for several years now, and it's working fairly well
(after I adjusted the scoring upwards for bayes rules).

It appears to me that the hidden text is being randomly generated.
Even saw a random function of some sort in there. I presume it's been
designed to 'poison' bayes by vitue of the random text (and a sizable
amount of it).


note that even the code for low-contrast HTML may be catched as spam...

bayes poisoning has been considered a myth. With good training, and using
hapaxes (enabled by default) it can even help detecting the spam.



John Hardin was instrumental in helping me identify the problem. The 
rule for low contrast text wasn't firing with SA v3.3.4. I upgraded to 
3.4.0, which appears to have fixed the problem.


Many thanks John!

P.S. I did have to apply a patch to 3.4.0 in order for spamd to function 
properly. Sorry I neglected to note the bug number (searching closed 
bugs throws an error at this time). The patch can be found here:

https://github.com/QMailToaster/spamassassin/blob/master/v340-util.patch

--
-Eric 'shubes'



Re: random low contrast text with bayes

2014-08-31 Thread Eric Shubert

On 08/31/2014 10:26 PM, John Hardin wrote:

On Sun, 31 Aug 2014, Eric Shubert wrote:


I've seen an uptick of spam lately with random low contrast (hidden)
text. This appears to be lowering bayes probabilities.


Learn them as spam. That will tend to eliminate that effect.



Been doing that (learning them) for quite a while. I've had that 
mechanism set up for several years now, and it's working fairly well 
(after I adjusted the scoring upwards for bayes rules).


It appears to me that the hidden text is being randomly generated. Even 
saw a random function of some sort in there. I presume it's been 
designed to 'poison' bayes by vitue of the random text (and a sizable 
amount of it).


Thanks.
--
-Eric 'shubes'



random low contrast text with bayes

2014-08-31 Thread Eric Shubert
I've seen an uptick of spam lately with random low contrast (hidden) 
text. This appears to be lowering bayes probabilities.


I'd like to strip low contrast text from messages before they're learned 
by sa-learn in order to combat this.


1) does anyone have some guidance for building such a filter?

2) Is there perhaps a better way of dealing with this type of spam?

Thanks.

--
-Eric 'shubes'