Re: I'm doing it wrong.

2014-05-23 Thread Kai Meyer

On 05/22/2014 10:36 PM, Kai Meyer wrote:

On Fri, 23 May 2014 05:33:31 +0200, Karsten Bräckelmann wrote:

On Thu, 2014-05-22 at 20:14 -0600, Kai Meyer wrote:

I have a CentOS 6 postfix + dovecot + mysql (for vmail) + spamassassin
(user prefs via mysql) server that I've been running for a few years


The configuration you pasted below does not show any user_* options.
Unless there are more cf files you omitted, you do not use user_prefs
via SQL.

now. It's just a few of my private domains, not a lot of traffic. In 
the

last 6 months, the amount of spam getting through has gone from one or
two a week to 30 a day. I had sa-learn setup on imap folders called 
SPAM

and HAM running as root, so I just started tossing emails in there. It


Training as root rather than the system user receiving the mail (and
calling SA) is only possible with site-wide Bayes setup. The pasted
configuration doesn't show that, either, so you would need to train as
the mail receiving / scanning user.

Ya, that was what I was worried about. Just to clarify, postfix runs 
as the regular "postfix" user. I'm configured very similar to this:

http://www.akadia.com/services/postfix_spamassassin.html
Notice the spamchk script. My process list has this entry:
postfix  10477 12953  0 22:20 ?00:00:00 pipe -n spamchk -t 
unix flags=Rq user=spamd argv=/usr/local/bin/spamchk -f ${sender} -- 
${recipient}
My spamchk is functionally identical to the one in the link above. 
(I'm using the sideline option, rather than just dumping the email, or 
sending it to another mailbox). My spamd service runs as the user spamd:
root  6188 1  0 15:56 ?00:00:08 /usr/bin/spamd -d -m10 
-q -x -u spamd -r /var/run/spamd.pid

spamd 6190  6188  0 15:56 ?00:01:27 spamd child
So when I run spamassassin manually, I'm using sudo to switch to that 
user (cat test.mail.left | sudo -u spamd /usr/bin/spamc -u 
k...@gnukai.com > test.mail.right)
So if I turn sa-learn back on, I should make sure that I run it as the 
spamd user.
seemed like I had groups of emails around 2, 0, -1, and -2 (my 
threshold
to dump to my JUNK folder is 3, and I have spamchk sideline things 
above

7). I still get legitimate email in the 2-3 range, but I haven't had
legitimate email above 3 in a long time. After a bit, the 2s became 3s
and the 0s became 1s, but the -1 and -2 spam emails stayed put. I did
this habitually for more than a month, and the progress seemed to stop.
I googled around a bit and realized that I didn't do a very good job
setting up rules, so I added pyzor and razor2, and they seem 
functional.

Spam got better, and it's down to maybe 10 a day, but they still range
all the way up to 5.


Mixing in Razor or Pyzor sure can help. But that "setting up rules" you
just considered your job is a bit weird. Local rules of course also can
help, but are  (a) an advanced topic, and  (b) not the task of a regular
SA instance. You didn't mention any of that in your configuration
either, so it's unclear what you're about here.


I think by "setting up rules" I meant "adding configurations for pyzor 
and razor2" and the likes. Are they called plugins?



What really gets me is that if I take an email that scores -2, strip
the X-Spam* headers, and run it through spamc by hand (even as the 
spamd

user) just like the spamchk script does, it scores around a 4. I have


It is not necessary to strip X-Spam headers. SA ignores these, if
present.

You just mixed in a third user, spamd -- in addition to root and the
real mail receiving user. Without site-wide Bayes you are comparing
apples to oranges, and now peaches. All yummy, though not the same.

What is that "spamchk script" you just mentioned, and how does it fit
into your setup? You should review your entire mail-processing chain.
Describing it in detail might help here, too.

In the link above, it describes my process pretty closely. I deviate 
by having a sql.cf:

# cat /etc/mail/spamassassin/sql.cf
user_scores_dsn DBI:mysql:spamassassin:localhost:3306
user_scores_sql_password spampass
user_scores_sql_username spamd
user_scores_sql_custom_query SELECT preference, value FROM _TABLE_ 
WHERE username = _USERNAME_ OR username = '$GLOBAL' OR username = 
CONCAT('%',_DOMAIN_) ORDER BY username ASC


Here's some of the db:
mysql> select * from userpref where username='$GLOBAL';
++--++---+--+-+--+-+ 

| id | username | preference | value | descript | 
added  | added_by | modified|
++--++---+--+-+--+-+ 

|  1 | $GLOBAL  | required_score | 4.5   | NULL | 2003-01-01 
00:00:00 |  | 2010-08-23 10:23:26 |
| 28 | $GLOBAL  | auto_learn 

Re: I'm doing it wrong.

2014-05-22 Thread Kai Meyer

On Fri, 23 May 2014 05:33:31 +0200, Karsten Bräckelmann wrote:

On Thu, 2014-05-22 at 20:14 -0600, Kai Meyer wrote:
I have a CentOS 6 postfix + dovecot + mysql (for vmail) + 
spamassassin

(user prefs via mysql) server that I've been running for a few years


The configuration you pasted below does not show any user_* options.
Unless there are more cf files you omitted, you do not use user_prefs
via SQL.

now. It's just a few of my private domains, not a lot of traffic. In 
the
last 6 months, the amount of spam getting through has gone from one 
or
two a week to 30 a day. I had sa-learn setup on imap folders called 
SPAM
and HAM running as root, so I just started tossing emails in there. 
It


Training as root rather than the system user receiving the mail (and
calling SA) is only possible with site-wide Bayes setup. The pasted
configuration doesn't show that, either, so you would need to train 
as

the mail receiving / scanning user.

Ya, that was what I was worried about. Just to clarify, postfix runs as 
the regular "postfix" user. I'm configured very similar to this:

http://www.akadia.com/services/postfix_spamassassin.html
Notice the spamchk script. My process list has this entry:
postfix  10477 12953  0 22:20 ?00:00:00 pipe -n spamchk -t unix 
flags=Rq user=spamd argv=/usr/local/bin/spamchk -f ${sender} -- 
${recipient}
My spamchk is functionally identical to the one in the link above. (I'm 
using the sideline option, rather than just dumping the email, or 
sending it to another mailbox). My spamd service runs as the user spamd:
root  6188 1  0 15:56 ?00:00:08 /usr/bin/spamd -d -m10 
-q -x -u spamd -r /var/run/spamd.pid

spamd 6190  6188  0 15:56 ?00:01:27 spamd child
So when I run spamassassin manually, I'm using sudo to switch to that 
user (cat test.mail.left | sudo -u spamd /usr/bin/spamc -u 
k...@gnukai.com > test.mail.right)
So if I turn sa-learn back on, I should make sure that I run it as the 
spamd user.
seemed like I had groups of emails around 2, 0, -1, and -2 (my 
threshold
to dump to my JUNK folder is 3, and I have spamchk sideline things 
above

7). I still get legitimate email in the 2-3 range, but I haven't had
legitimate email above 3 in a long time. After a bit, the 2s became 
3s
and the 0s became 1s, but the -1 and -2 spam emails stayed put. I 
did
this habitually for more than a month, and the progress seemed to 
stop.

I googled around a bit and realized that I didn't do a very good job
setting up rules, so I added pyzor and razor2, and they seem 
functional.
Spam got better, and it's down to maybe 10 a day, but they still 
range

all the way up to 5.


Mixing in Razor or Pyzor sure can help. But that "setting up rules" 
you
just considered your job is a bit weird. Local rules of course also 
can
help, but are  (a) an advanced topic, and  (b) not the task of a 
regular

SA instance. You didn't mention any of that in your configuration
either, so it's unclear what you're about here.


I think by "setting up rules" I meant "adding configurations for pyzor 
and razor2" and the likes. Are they called plugins?



What really gets me is that if I take an email that scores -2, strip
the X-Spam* headers, and run it through spamc by hand (even as the 
spamd
user) just like the spamchk script does, it scores around a 4. I 
have


It is not necessary to strip X-Spam headers. SA ignores these, if
present.

You just mixed in a third user, spamd -- in addition to root and the
real mail receiving user. Without site-wide Bayes you are comparing
apples to oranges, and now peaches. All yummy, though not the same.

What is that "spamchk script" you just mentioned, and how does it fit
into your setup? You should review your entire mail-processing chain.
Describing it in detail might help here, too.

In the link above, it describes my process pretty closely. I deviate by 
having a sql.cf:

# cat /etc/mail/spamassassin/sql.cf
user_scores_dsn  DBI:mysql:spamassassin:localhost:3306
user_scores_sql_password spampass
user_scores_sql_username spamd
user_scores_sql_custom_query SELECT preference, value FROM _TABLE_ 
WHERE username = _USERNAME_ OR username = '$GLOBAL' OR username = 
CONCAT('%',_DOMAIN_) ORDER BY username ASC


Here's some of the db:
mysql> select * from userpref where username='$GLOBAL';
++--++---+--+-+--+-+
| id | username | preference | value | descript | added 
 | added_by | modified|

++--++---+--+-+--+-+
|  1 | $GLOBAL  | required_score | 4.5   | NULL | 2003-01-01 
00:00:00 |  | 2010-08-23 10:23:26 |
| 28 | $GLOBAL  | auto_learn | 0 | NULL | 

I'm doing it wrong.

2014-05-22 Thread Kai Meyer
I have a CentOS 6 postfix + dovecot + mysql (for vmail) + spamassassin 
(user prefs via mysql) server that I've been running for a few years 
now. It's just a few of my private domains, not a lot of traffic. In the 
last 6 months, the amount of spam getting through has gone from one or 
two a week to 30 a day. I had sa-learn setup on imap folders called SPAM 
and HAM running as root, so I just started tossing emails in there. It 
seemed like I had groups of emails around 2, 0, -1, and -2 (my threshold 
to dump to my JUNK folder is 3, and I have spamchk sideline things above 
7). I still get legitimate email in the 2-3 range, but I haven't had 
legitimate email above 3 in a long time. After a bit, the 2s became 3s 
and the 0s became 1s, but the -1 and -2 spam emails stayed put. I did 
this habitually for more than a month, and the progress seemed to stop. 
I googled around a bit and realized that I didn't do a very good job 
setting up rules, so I added pyzor and razor2, and they seem functional. 
Spam got better, and it's down to maybe 10 a day, but they still range 
all the way up to 5.


What really gets me is that if I take an email that scores -2, strip 
the X-Spam* headers, and run it through spamc by hand (even as the spamd 
user) just like the spamchk script does, it scores around a 4. I have 
one here that scores a 4.1 if it comes through the mail, and a 6.6 if I 
run it manually. What can I do to reconcile these scores? I would like 
the scores I'm getting from the commandline over the ones I'm getting 
through postfix, but I don't know the system well enough to know what is 
causing the difference.


== Via postfix
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on 
kai2.gnukai.com

X-Spam-Flag: YES
X-Spam-Level: 
X-Spam-Status: Yes, score=4.1 required=3.0 
tests=BAYES_60,HTML_IMAGE_RATIO_08,
HTML_MESSAGE,INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,SPF_PASS 
autolearn=no

version=3.3.1
...
Content analysis details:   (4.1 points, 3.0 required)

 pts rule name  description
 -- 
--

 1.1 INVALID_DATE   Invalid Date: header (not RFC 2822)
-0.0 SPF_PASS   SPF: sender matches SPF record
 0.0 HTML_IMAGE_RATIO_08BODY: HTML has a low ratio of text to image 
area

 1.5 BAYES_60   BODY: Bayes spam probability is 60 to 80%
[score: 0.6298]
 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 0.8 RDNS_NONE  Delivered to internal network by a host 
with no rDNS



 Via commandline (cat test.mail | sudo -u spamd 
/usr/bin/spamc -u  > postsa.mail)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on 
kai2.gnukai.com

X-Spam-Flag: YES
X-Spam-Level: **
X-Spam-Status: Yes, score=6.6 required=3.0 tests=BAYES_60,HTML_MESSAGE,
INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,SPF_PASS,URIBL_DBL_SPAM 
autolearn=no

version=3.3.1
...
Content analysis details:   (6.6 points, 3.0 required)

 pts rule name  description
 -- 
--

 1.1 INVALID_DATE   Invalid Date: header (not RFC 2822)
-0.0 SPF_PASS   SPF: sender matches SPF record
 2.5 URIBL_DBL_SPAM Contains an URL listed in the DBL blocklist
[URIs: fellage.me]
 1.5 BAYES_60   BODY: Bayes spam probability is 60 to 80%
[score: 0.6299]
 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 0.8 RDNS_NONE  Delivered to internal network by a host 
with no rDNS



 /etc/mail/spamassassin.cf (I added the last 4 lines in 
a desperate attempt to see something change, but to no effect)

/etc/mail/spamassassin/local.cf
# These values can be overridden by editing 
~/.spamassassin/user_prefs.cf

# (see spamassassin(1) for details)

# These should be safe assumptions and allow for simple visual sifting
# without risking lost emails.

required_hits 5.0
report_safe 1
rewrite_header Subject [***SPAM***]
add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ 
tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_

trusted_networks 69.160.84.222
razor_config /etc/mail/spamassassin/.razor/razor-agent.conf
pyzor_options --homedir /etc/mail/spamassassin
auto_learn 0
use_razor2
use_dcc
use_pyzor