Re: Rule FH_RANDOM_SURE causing FPs

2014-01-16 Thread Chip M.
I just checked the last six months of my most diverse corpus,
and found:  two Ham, zero spam.

Both ham were sent via different ESPs, each of mediocre 
quality though with multiple legitimate (albeit Pakled-y)
customers.

One was from "Marriott Rewards" with terse SA report:
score=0.9 required=5.1 tests=DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, 
FH_RANDOM_SURE, FROM_EXCESS_BASE64, HTML_IMAGE_RATIO_08, HTML_MESSAGE, 
URIBL_BLOCKED

One was from "MapMyRun" with terse SA report:
score=6.3 required=5.1 tests=DIET_1, DKIM_SIGNED, DKIM_VALID, 
DKIM_VALID_AU, FH_RANDOM_SURE, HTML_MESSAGE, MIME_HTML_ONLY, 
MIME_HTML_ONLY_MULTI, MPART_ALT_DIFF, RCVD_IN_DNSWL_NONE, SUBJECT_DIET

That's using SA 3.3.2 with auto-updates (at a shared webhost).

Upon request, off list I can send the Message-IDs to any 
SA dev(s).  If the corpse(s) would be helpful, I can ask 
the domain admin for them.

I'm planning some data-mining this weekend, and would be happy
to check more data  (mild brag: I finally added flagging to my
data-mining tools, so it will auto-log, even if I forget to
explicitly check).  :)
- "Chip"



Re: Rule FH_RANDOM_SURE causing FPs

2014-01-16 Thread Kevin A. McGrail

On 1/16/2014 6:20 PM, Axb wrote:

latest 72_scores.cf

score FH_RANDOM_SURE1.999 2.920 1.999 2.920

I'd say 0.5 pushes it very low.  - can we agree on 1.5?

Is it hitting on anything in your corpora?


Re: Rule FH_RANDOM_SURE causing FPs

2014-01-16 Thread Axb

On 01/17/2014 12:16 AM, Kevin A. McGrail wrote:

On 1/16/2014 5:20 PM, Axb wrote:

On 01/16/2014 11:03 PM, Brian Bebeau wrote:

We're having a problem with the FH_RANDOM_SURE rule causing false
positives.
It has a subrule __ALL_RANDOM, which is:

header   __ALL_RANDOM   ALL =~
/(?:[%\#\[\$]R?A?NDO?M?|\%(?:CUSTOM|FROM|PROXY|X?MESSA|MAKE_TXT|FROM_USER))/i


We have a user "ndrier", so legitimate email sometimes has a header
that starts like:

References: 

yes, the score can be lowered:

add:
score FH_RANDOM_SURE 0.1

to you local.cf.
That will fix your problem.

h2h

I don't show a single hit on this rule on my server in 10 days.

I'm setting a score ceiling of 0.5 on the rule and it should go out in a
day or so.


latest 72_scores.cf

score FH_RANDOM_SURE1.999 2.920 1.999 2.920

I'd say 0.5 pushes it very low.  - can we agree on 1.5?



Re: Rule FH_RANDOM_SURE causing FPs

2014-01-16 Thread Kevin A. McGrail

On 1/16/2014 5:20 PM, Axb wrote:

On 01/16/2014 11:03 PM, Brian Bebeau wrote:
We're having a problem with the FH_RANDOM_SURE rule causing false 
positives.

It has a subrule __ALL_RANDOM, which is:

header   __ALL_RANDOM   ALL =~ 
/(?:[%\#\[\$]R?A?NDO?M?|\%(?:CUSTOM|FROM|PROXY|X?MESSA|MAKE_TXT|FROM_USER))/i


We have a user "ndrier", so legitimate email sometimes has a header 
that starts like:


References: which matches the rule, since it contains "%nd". It looks like it's 
trying to
find "%random", but only "nd" is required to be there.  Could the 
score be

way lowered or the rule made more restrictive?



yes, the score can be lowered:

add:
score FH_RANDOM_SURE 0.1

to you local.cf.
That will fix your problem.

h2h

I don't show a single hit on this rule on my server in 10 days.

I'm setting a score ceiling of 0.5 on the rule and it should go out in a 
day or so.


Regards,
KAM


Re: Rule FH_RANDOM_SURE causing FPs

2014-01-16 Thread Axb

On 01/16/2014 11:03 PM, Brian Bebeau wrote:

We're having a problem with the FH_RANDOM_SURE rule causing false positives.
It has a subrule __ALL_RANDOM, which is:

header   __ALL_RANDOM   ALL =~ 
/(?:[%\#\[\$]R?A?NDO?M?|\%(?:CUSTOM|FROM|PROXY|X?MESSA|MAKE_TXT|FROM_USER))/i

We have a user "ndrier", so legitimate email sometimes has a header that starts 
like:

References: 

yes, the score can be lowered:

add:
score FH_RANDOM_SURE 0.1

to you local.cf.
That will fix your problem.

h2h


Rule FH_RANDOM_SURE causing FPs

2014-01-16 Thread Brian Bebeau
We're having a problem with the FH_RANDOM_SURE rule causing false positives.
It has a subrule __ALL_RANDOM, which is:

header   __ALL_RANDOM   ALL =~ 
/(?:[%\#\[\$]R?A?NDO?M?|\%(?:CUSTOM|FROM|PROXY|X?MESSA|MAKE_TXT|FROM_USER))/i

We have a user "ndrier", so legitimate email sometimes has a header that starts 
like:

References: http://www.trustwave.com/>




This transmission may contain information that is privileged, confidential, 
and/or exempt from disclosure under applicable law. If you are not the intended 
recipient, you are hereby notified that any disclosure, copying, distribution, 
or use of the information contained herein (including any reliance thereon) is 
strictly prohibited. If you received this transmission in error, please 
immediately contact the sender and destroy the material in its entirety, 
whether in electronic or hard copy format.


Re: SA 3.4.0rc5 Redis DB Help

2014-01-16 Thread Mark Martinec
me writes:
> Note that bayes_token_ttl and bayes_seen_ttl have no effect
> on entries loaded from a backup dump, they are all given
> a 'current' timestamp (with some random offset so that they
> will not expire at exactly the same time).  But for a steady-state,
> with these *_ttl settings you can control how many items are
> kept in a database on the average.

I should rephrase the above, which is not accurate.

Loaded tokens are given a current timestamp by a redis server itself,
and we give them expiration time as bayes_token_ttl with some
random variation:

  # by introducing some randomness (ttl times a factor of 0.7 .. 1.7),
  # we avoid auto-expiration of many tokens all at once,
  # introducing an unnecessary load spike on a redis server
  $r->b_call('EXPIRE', $key, int($token_ttl * (rand()+0.7)));


Mark


Re: SA 3.4.0rc5 Redis DB Help

2014-01-16 Thread Mark Martinec
Andy Jezierski writes:
> Are there any instructions in setting up the Bayes DB using a Redis 
> server?

Yes, in release notes (currently also in build/announcements/PROPOSED-3.4.0.txt
in svn). Pretty much exactly as you already have it.
 
> I've installed the server, took the sample config options and added them 
> to local.cf
> 
> bayes_store_module  Mail::SpamAssassin::BayesStore::Redis
> bayes_store_module_additional Mail::SpamAssassin::Util::TinyRedis
> bayes_sql_dsn   server=127.0.0.1:6379;password=spamd;database=2
> bayes_token_ttl 21d
> bayes_seen_ttl   8d
> bayes_auto_expire 1
> use_bayes   1
> bayes_auto_learn1
> 
> Performed a redis-cli -n 2 FLUSHDB
> 
> Did a backup of one of my mysql bayes databases and am attempting to do a 
> restore to the new system.

Good.

> Looks like the redis server keeps chewing up swap space until it runs out, 
> then the redis server terminates.
> 
> Running on FreeBSD 9.2   perl 5.18-5.18.2   redis server 2.8.4
> Any ideas?

Depends very much on the number of tokens you have in you SQL database.

Mine (cca 1000 users) keeps hovering at about 1 M tokens (and just keeps
few very recent 'seen' entries), resulting in redis server using under
300 MB of memory.

$ redis-cli -n 2 keys 'w:*' | wc -l
 1091475

$ redis-cli -n 2 keys 's:*' | wc -l
1324

May be worthwhile to purge old tokens from SQL first, before
creating a backup.  Also, it is safe to ditch the entire 'seen'
set of records, it's not worth transfering them to a new database.

If this still gives unreasonable number of tokens, it may be worth
decimating a set - just preserving a random subset of tokens.

Another option is to just start from an empty database. With a reasonable
set of other rules, network tests and autolearning on, the required
200 samples of ham and 200 of spam can be quickly reached on
a busy server. During initial learning consider decreasing score for
BAYES_00 and BAYES_99 rules.

Note that bayes_token_ttl and bayes_seen_ttl have no effect
on entries loaded from a backup dump, they are all given
a 'current' timestamp (with some random offset so that they
will not expire at exactly the same time).  But for a steady-state,
with these *_ttl settings you can control how many items are
kept in a database on the average.


Axb writes:
> what does sa-learn --dump magic say (when using mysql)

Good idea to check this first.

> my Redis
> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>   5728 root  20   0 5355m 5.1g 1020 S  1.3 37.1 711:19.45 redis-server
> 
> sa-learn --dump magic
> 0.000  0  3  0  non-token data: bayes db version
> 0.000  0   16481050  0  non-token data: nspam
> 0.000  05690858  0  non-token data: nham

> bayes_token_ttl   864000
> bayes_seen_ttl  2d

A biggie!

Btw, with redis db the number of tokens actually in a database
may not be directly related to the number of learned and reported
tokens bacause of the automatic expiration performed by
redis server (according to bayes_token_ttl) - unlike other bayes
back-ends where purging is done explicitly by SpamAssassin.

  Mark


Re: SA 3.4.0rc5 Redis DB Help

2014-01-16 Thread Andy Jezierski
Axb  wrote on 01/16/2014 11:16:32 AM:

> From: Axb 
> To: users@spamassassin.apache.org, 
> Date: 01/16/2014 11:17 AM
> Subject: Re: SA 3.4.0rc5 Redis DB Help
> 
> You'll need quite  a lot of memory to run Bayes/Redis
> 

Sounds like I might stick with mysql then.

> Did you expire old tokens before the backup?
>

Yes.
 
> what does sa-learn --dump magic say (when using mysql)
> 
> 

viper# sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  03737717  0  non-token data: nspam
0.000  02976364  0  non-token data: nham
0.000  0 962882  0  non-token data: ntokens
0.000  0 1378765687  0  non-token data: oldest atime
0.000  0 1389823736  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal sync 
atime
0.000  0 1389886856  0  non-token data: last expiry atime
0.000  0   11059200  0  non-token data: last expire atime 
delta
0.000  0 918938  0  non-token data: last expire 
reduction count
viper#

> my Redis
> 
> top - 18:11:53 up 39 days,  8:01,  1 user,  load average: 0.80, 0.29, 
0.16
> Tasks: 128 total,   1 running, 127 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.5%us,  0.0%sy,  0.0%ni, 99.5%id,  0.0%wa,  0.0%hi,  0.0%si, 
> 0.0%st
> Mem:  14262656k total,  7572280k used,  6690376k free,   171460k buffers
> Swap:  2046968k total,0k used,  2046968k free,  1765856k cached
> 
> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>   5728 root  20   0 5355m 5.1g 1020 S  1.3 37.1 711:19.45 
redis-server
> 
> sa-learn --dump magic
> 0.000  0  3  0  non-token data: bayes db version
> 0.000  0   16481050  0  non-token data: nspam
> 0.000  05690858  0  non-token data: nham
> 
> 
> bayes_token_ttl   864000
> bayes_seen_ttl  2d
> 
> 
> 
> On 01/16/2014 05:58 PM, Andy Jezierski wrote:
> > Are there any instructions in setting up the Bayes DB using a Redis
> > server?
> >
> > I've installed the server, took the sample config options and added 
them
> > to local.cf
> >
> > bayes_store_module  Mail::SpamAssassin::BayesStore::Redis
> > bayes_store_module_additional Mail::SpamAssassin::Util::TinyRedis
> > bayes_sql_dsn   server=127.0.0.1:6379;password=spamd;database=2
> > bayes_token_ttl 21d
> > bayes_seen_ttl   8d
> > bayes_auto_expire 1
> > use_bayes   1
> > bayes_auto_learn1
> >
> > Performed a redis-cli -n 2 FLUSHDB
> >
> > Did a backup of one of my mysql bayes databases and am attempting to 
do a
> > restore to the new system.
> > Looks like the redis server keeps chewing up swap space until it runs 
out,
> > then the redis server terminates.
> >
> > Running on FreeBSD 9.2   perl 5.18-5.18.2   redis server 2.8.4Any
> > ideas?
> >
> > root@spam2:~/.spamassassin # sa-learn --restore backup.txt
> > bayes: note: assuming the database is empty; to manually clear a 
database:
> > redis-cli -n  FLUSHDB
> > Error reading from Redis server:  at
> > 
/usr/local/lib/perl5/site_perl/5.18/Mail/SpamAssassin/Util/TinyRedis.pm
> > line 105,  chunk 5249066.
> > root@spam2:~/.spamassassin #
> >
> >  From messages.log
> >
> > Jan 16 10:41:05 spam2 kernel: swap_pager: out of swap space
> > Jan 16 10:41:05 spam2 kernel: swap_pager_getswapspace(1): failed
> > Jan 16 10:41:06 spam2 kernel: pid 2879 (redis-server), uid 535, was
> > killed: out of swap space
> >
> > last pid:  2929;  load averages:  0.57,  0.57,  0.40   up 
0+16:58:52
> > 10:38:33
> > 31 processes:  1 running, 30 sleeping
> > CPU:  0.0% user,  0.0% nice,  4.6% system,  5.4% interrupt, 90.0% idle
> > Mem: 722M Active, 66M Inact, 168M Wired, 25M Cache, 94M Buf, 3616K 
Free
> > Swap: 819M Total, 715M Used, 104M Free, 87% Inuse, 1820K In, 2176K Out
> >
> >PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME   WCPU
> > COMMAND
> >   2903 root  1  200 42252K  2208K sbwait  1   1:39  0.00% 
perl
> >   2879 redis 3  520   814M   431M uwait   3   1:02  0.00%
> > redis-server
> >   1533 root  1  200 54580K  2044K select  1   0:06  0.00% 
perl
> >   2915 redis 1  200   810M   401M swread  2   0:04  0.00%
> > redis-server
> >677 root  1  200 11256K   520K select  3   0:02  0.00% 
ntpd
> >831 ajezierski1  200 15852K   512K select  1   0:02  0.00% 
sshd
> >763 ajezierski1  200 15852K   432K select  3   0:02  0.00% 
sshd
> >   1534 spamd 1  200 58676K  1608K select  3   0:01  0.00% 
perl
> >602 root  1  200  9544K   372K select  1   0:01  0.00%
> > syslogd
> >
> 


Re: SA 3.4.0rc5 Redis DB Help

2014-01-16 Thread Axb

You'll need quite  a lot of memory to run Bayes/Redis

Did you expire old tokens before the backup?

what does sa-learn --dump magic say (when using mysql)


my Redis

top - 18:11:53 up 39 days,  8:01,  1 user,  load average: 0.80, 0.29, 0.16
Tasks: 128 total,   1 running, 127 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.5%us,  0.0%sy,  0.0%ni, 99.5%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st

Mem:  14262656k total,  7572280k used,  6690376k free,   171460k buffers
Swap:  2046968k total,0k used,  2046968k free,  1765856k cached

PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 5728 root  20   0 5355m 5.1g 1020 S  1.3 37.1 711:19.45 redis-server

sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0   16481050  0  non-token data: nspam
0.000  05690858  0  non-token data: nham


bayes_token_ttl 864000
bayes_seen_ttl  2d



On 01/16/2014 05:58 PM, Andy Jezierski wrote:

Are there any instructions in setting up the Bayes DB using a Redis
server?

I've installed the server, took the sample config options and added them
to local.cf

bayes_store_module  Mail::SpamAssassin::BayesStore::Redis
bayes_store_module_additional Mail::SpamAssassin::Util::TinyRedis
bayes_sql_dsn   server=127.0.0.1:6379;password=spamd;database=2
bayes_token_ttl 21d
bayes_seen_ttl   8d
bayes_auto_expire 1
use_bayes   1
bayes_auto_learn1

Performed a redis-cli -n 2 FLUSHDB

Did a backup of one of my mysql bayes databases and am attempting to do a
restore to the new system.
Looks like the redis server keeps chewing up swap space until it runs out,
then the redis server terminates.

Running on FreeBSD 9.2   perl 5.18-5.18.2   redis server 2.8.4Any
ideas?

root@spam2:~/.spamassassin # sa-learn --restore backup.txt
bayes: note: assuming the database is empty; to manually clear a database:
redis-cli -n  FLUSHDB
Error reading from Redis server:  at
/usr/local/lib/perl5/site_perl/5.18/Mail/SpamAssassin/Util/TinyRedis.pm
line 105,  chunk 5249066.
root@spam2:~/.spamassassin #

 From messages.log

Jan 16 10:41:05 spam2 kernel: swap_pager: out of swap space
Jan 16 10:41:05 spam2 kernel: swap_pager_getswapspace(1): failed
Jan 16 10:41:06 spam2 kernel: pid 2879 (redis-server), uid 535, was
killed: out of swap space

last pid:  2929;  load averages:  0.57,  0.57,  0.40   up 0+16:58:52
10:38:33
31 processes:  1 running, 30 sleeping
CPU:  0.0% user,  0.0% nice,  4.6% system,  5.4% interrupt, 90.0% idle
Mem: 722M Active, 66M Inact, 168M Wired, 25M Cache, 94M Buf, 3616K Free
Swap: 819M Total, 715M Used, 104M Free, 87% Inuse, 1820K In, 2176K Out

   PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME   WCPU
COMMAND
  2903 root  1  200 42252K  2208K sbwait  1   1:39  0.00% perl
  2879 redis 3  520   814M   431M uwait   3   1:02  0.00%
redis-server
  1533 root  1  200 54580K  2044K select  1   0:06  0.00% perl
  2915 redis 1  200   810M   401M swread  2   0:04  0.00%
redis-server
   677 root  1  200 11256K   520K select  3   0:02  0.00% ntpd
   831 ajezierski1  200 15852K   512K select  1   0:02  0.00% sshd
   763 ajezierski1  200 15852K   432K select  3   0:02  0.00% sshd
  1534 spamd 1  200 58676K  1608K select  3   0:01  0.00% perl
   602 root  1  200  9544K   372K select  1   0:01  0.00%
syslogd





SA 3.4.0rc5 Redis DB Help

2014-01-16 Thread Andy Jezierski
Are there any instructions in setting up the Bayes DB using a Redis 
server?

I've installed the server, took the sample config options and added them 
to local.cf

bayes_store_module  Mail::SpamAssassin::BayesStore::Redis
bayes_store_module_additional Mail::SpamAssassin::Util::TinyRedis
bayes_sql_dsn   server=127.0.0.1:6379;password=spamd;database=2
bayes_token_ttl 21d
bayes_seen_ttl   8d
bayes_auto_expire 1
use_bayes   1
bayes_auto_learn1

Performed a redis-cli -n 2 FLUSHDB

Did a backup of one of my mysql bayes databases and am attempting to do a 
restore to the new system.
Looks like the redis server keeps chewing up swap space until it runs out, 
then the redis server terminates.

Running on FreeBSD 9.2   perl 5.18-5.18.2   redis server 2.8.4Any 
ideas?

root@spam2:~/.spamassassin # sa-learn --restore backup.txt
bayes: note: assuming the database is empty; to manually clear a database: 
redis-cli -n  FLUSHDB
Error reading from Redis server:  at 
/usr/local/lib/perl5/site_perl/5.18/Mail/SpamAssassin/Util/TinyRedis.pm 
line 105,  chunk 5249066.
root@spam2:~/.spamassassin #

>From messages.log

Jan 16 10:41:05 spam2 kernel: swap_pager: out of swap space
Jan 16 10:41:05 spam2 kernel: swap_pager_getswapspace(1): failed
Jan 16 10:41:06 spam2 kernel: pid 2879 (redis-server), uid 535, was 
killed: out of swap space

last pid:  2929;  load averages:  0.57,  0.57,  0.40   up 0+16:58:52 
10:38:33
31 processes:  1 running, 30 sleeping
CPU:  0.0% user,  0.0% nice,  4.6% system,  5.4% interrupt, 90.0% idle
Mem: 722M Active, 66M Inact, 168M Wired, 25M Cache, 94M Buf, 3616K Free
Swap: 819M Total, 715M Used, 104M Free, 87% Inuse, 1820K In, 2176K Out

  PID USERNAMETHR PRI NICE   SIZERES STATE   C   TIME   WCPU 
COMMAND
 2903 root  1  200 42252K  2208K sbwait  1   1:39  0.00% perl
 2879 redis 3  520   814M   431M uwait   3   1:02  0.00% 
redis-server
 1533 root  1  200 54580K  2044K select  1   0:06  0.00% perl
 2915 redis 1  200   810M   401M swread  2   0:04  0.00% 
redis-server
  677 root  1  200 11256K   520K select  3   0:02  0.00% ntpd
  831 ajezierski1  200 15852K   512K select  1   0:02  0.00% sshd
  763 ajezierski1  200 15852K   432K select  3   0:02  0.00% sshd
 1534 spamd 1  200 58676K  1608K select  3   0:01  0.00% perl
  602 root  1  200  9544K   372K select  1   0:01  0.00% 
syslogd

Re: Do you want to buy this domain name spam

2014-01-16 Thread Kevin A. McGrail

On 1/16/2014 1:51 AM, Marc Perkel wrote:
I'm seeing a lot of "Do you want to buy this domain name" spam lately. 
Is it just me or is anyone else seeing this?


I saw a lot a few weeks ago but have been using rules and RBL stuff to 
battle with very good success.  Have you ever looked at the KAM.cf rules 
I write?


regards,
KAM