Re: I'm doing it wrong.
On 05/22/2014 10:36 PM, Kai Meyer wrote: On Fri, 23 May 2014 05:33:31 +0200, Karsten Bräckelmann wrote: On Thu, 2014-05-22 at 20:14 -0600, Kai Meyer wrote: I have a CentOS 6 postfix + dovecot + mysql (for vmail) + spamassassin (user prefs via mysql) server that I've been running for a few years The configuration you pasted below does not show any user_* options. Unless there are more cf files you omitted, you do not use user_prefs via SQL. now. It's just a few of my private domains, not a lot of traffic. In the last 6 months, the amount of spam getting through has gone from one or two a week to 30 a day. I had sa-learn setup on imap folders called SPAM and HAM running as root, so I just started tossing emails in there. It Training as root rather than the system user receiving the mail (and calling SA) is only possible with site-wide Bayes setup. The pasted configuration doesn't show that, either, so you would need to train as the mail receiving / scanning user. Ya, that was what I was worried about. Just to clarify, postfix runs as the regular "postfix" user. I'm configured very similar to this: http://www.akadia.com/services/postfix_spamassassin.html Notice the spamchk script. My process list has this entry: postfix 10477 12953 0 22:20 ?00:00:00 pipe -n spamchk -t unix flags=Rq user=spamd argv=/usr/local/bin/spamchk -f ${sender} -- ${recipient} My spamchk is functionally identical to the one in the link above. (I'm using the sideline option, rather than just dumping the email, or sending it to another mailbox). My spamd service runs as the user spamd: root 6188 1 0 15:56 ?00:00:08 /usr/bin/spamd -d -m10 -q -x -u spamd -r /var/run/spamd.pid spamd 6190 6188 0 15:56 ?00:01:27 spamd child So when I run spamassassin manually, I'm using sudo to switch to that user (cat test.mail.left | sudo -u spamd /usr/bin/spamc -u k...@gnukai.com > test.mail.right) So if I turn sa-learn back on, I should make sure that I run it as the spamd user. seemed like I had groups of emails around 2, 0, -1, and -2 (my threshold to dump to my JUNK folder is 3, and I have spamchk sideline things above 7). I still get legitimate email in the 2-3 range, but I haven't had legitimate email above 3 in a long time. After a bit, the 2s became 3s and the 0s became 1s, but the -1 and -2 spam emails stayed put. I did this habitually for more than a month, and the progress seemed to stop. I googled around a bit and realized that I didn't do a very good job setting up rules, so I added pyzor and razor2, and they seem functional. Spam got better, and it's down to maybe 10 a day, but they still range all the way up to 5. Mixing in Razor or Pyzor sure can help. But that "setting up rules" you just considered your job is a bit weird. Local rules of course also can help, but are (a) an advanced topic, and (b) not the task of a regular SA instance. You didn't mention any of that in your configuration either, so it's unclear what you're about here. I think by "setting up rules" I meant "adding configurations for pyzor and razor2" and the likes. Are they called plugins? What really gets me is that if I take an email that scores -2, strip the X-Spam* headers, and run it through spamc by hand (even as the spamd user) just like the spamchk script does, it scores around a 4. I have It is not necessary to strip X-Spam headers. SA ignores these, if present. You just mixed in a third user, spamd -- in addition to root and the real mail receiving user. Without site-wide Bayes you are comparing apples to oranges, and now peaches. All yummy, though not the same. What is that "spamchk script" you just mentioned, and how does it fit into your setup? You should review your entire mail-processing chain. Describing it in detail might help here, too. In the link above, it describes my process pretty closely. I deviate by having a sql.cf: # cat /etc/mail/spamassassin/sql.cf user_scores_dsn DBI:mysql:spamassassin:localhost:3306 user_scores_sql_password spampass user_scores_sql_username spamd user_scores_sql_custom_query SELECT preference, value FROM _TABLE_ WHERE username = _USERNAME_ OR username = '$GLOBAL' OR username = CONCAT('%',_DOMAIN_) ORDER BY username ASC Here's some of the db: mysql> select * from userpref where username='$GLOBAL'; ++--++---+--+-+--+-+ | id | username | preference | value | descript | added | added_by | modified| ++--++---+--+-+--+-+ | 1 | $GLOBAL | required_score | 4.5 | NULL | 2003-01-01 00:00:00 | | 2010-08-23 10:23:26 | | 28 | $GLOBAL | auto_learn
Re: I'm doing it wrong.
On Fri, 23 May 2014 05:33:31 +0200, Karsten Bräckelmann wrote: On Thu, 2014-05-22 at 20:14 -0600, Kai Meyer wrote: I have a CentOS 6 postfix + dovecot + mysql (for vmail) + spamassassin (user prefs via mysql) server that I've been running for a few years The configuration you pasted below does not show any user_* options. Unless there are more cf files you omitted, you do not use user_prefs via SQL. now. It's just a few of my private domains, not a lot of traffic. In the last 6 months, the amount of spam getting through has gone from one or two a week to 30 a day. I had sa-learn setup on imap folders called SPAM and HAM running as root, so I just started tossing emails in there. It Training as root rather than the system user receiving the mail (and calling SA) is only possible with site-wide Bayes setup. The pasted configuration doesn't show that, either, so you would need to train as the mail receiving / scanning user. Ya, that was what I was worried about. Just to clarify, postfix runs as the regular "postfix" user. I'm configured very similar to this: http://www.akadia.com/services/postfix_spamassassin.html Notice the spamchk script. My process list has this entry: postfix 10477 12953 0 22:20 ?00:00:00 pipe -n spamchk -t unix flags=Rq user=spamd argv=/usr/local/bin/spamchk -f ${sender} -- ${recipient} My spamchk is functionally identical to the one in the link above. (I'm using the sideline option, rather than just dumping the email, or sending it to another mailbox). My spamd service runs as the user spamd: root 6188 1 0 15:56 ?00:00:08 /usr/bin/spamd -d -m10 -q -x -u spamd -r /var/run/spamd.pid spamd 6190 6188 0 15:56 ?00:01:27 spamd child So when I run spamassassin manually, I'm using sudo to switch to that user (cat test.mail.left | sudo -u spamd /usr/bin/spamc -u k...@gnukai.com > test.mail.right) So if I turn sa-learn back on, I should make sure that I run it as the spamd user. seemed like I had groups of emails around 2, 0, -1, and -2 (my threshold to dump to my JUNK folder is 3, and I have spamchk sideline things above 7). I still get legitimate email in the 2-3 range, but I haven't had legitimate email above 3 in a long time. After a bit, the 2s became 3s and the 0s became 1s, but the -1 and -2 spam emails stayed put. I did this habitually for more than a month, and the progress seemed to stop. I googled around a bit and realized that I didn't do a very good job setting up rules, so I added pyzor and razor2, and they seem functional. Spam got better, and it's down to maybe 10 a day, but they still range all the way up to 5. Mixing in Razor or Pyzor sure can help. But that "setting up rules" you just considered your job is a bit weird. Local rules of course also can help, but are (a) an advanced topic, and (b) not the task of a regular SA instance. You didn't mention any of that in your configuration either, so it's unclear what you're about here. I think by "setting up rules" I meant "adding configurations for pyzor and razor2" and the likes. Are they called plugins? What really gets me is that if I take an email that scores -2, strip the X-Spam* headers, and run it through spamc by hand (even as the spamd user) just like the spamchk script does, it scores around a 4. I have It is not necessary to strip X-Spam headers. SA ignores these, if present. You just mixed in a third user, spamd -- in addition to root and the real mail receiving user. Without site-wide Bayes you are comparing apples to oranges, and now peaches. All yummy, though not the same. What is that "spamchk script" you just mentioned, and how does it fit into your setup? You should review your entire mail-processing chain. Describing it in detail might help here, too. In the link above, it describes my process pretty closely. I deviate by having a sql.cf: # cat /etc/mail/spamassassin/sql.cf user_scores_dsn DBI:mysql:spamassassin:localhost:3306 user_scores_sql_password spampass user_scores_sql_username spamd user_scores_sql_custom_query SELECT preference, value FROM _TABLE_ WHERE username = _USERNAME_ OR username = '$GLOBAL' OR username = CONCAT('%',_DOMAIN_) ORDER BY username ASC Here's some of the db: mysql> select * from userpref where username='$GLOBAL'; ++--++---+--+-+--+-+ | id | username | preference | value | descript | added | added_by | modified| ++--++---+--+-+--+-+ | 1 | $GLOBAL | required_score | 4.5 | NULL | 2003-01-01 00:00:00 | | 2010-08-23 10:23:26 | | 28 | $GLOBAL | auto_learn | 0 | NULL |
I'm doing it wrong.
I have a CentOS 6 postfix + dovecot + mysql (for vmail) + spamassassin (user prefs via mysql) server that I've been running for a few years now. It's just a few of my private domains, not a lot of traffic. In the last 6 months, the amount of spam getting through has gone from one or two a week to 30 a day. I had sa-learn setup on imap folders called SPAM and HAM running as root, so I just started tossing emails in there. It seemed like I had groups of emails around 2, 0, -1, and -2 (my threshold to dump to my JUNK folder is 3, and I have spamchk sideline things above 7). I still get legitimate email in the 2-3 range, but I haven't had legitimate email above 3 in a long time. After a bit, the 2s became 3s and the 0s became 1s, but the -1 and -2 spam emails stayed put. I did this habitually for more than a month, and the progress seemed to stop. I googled around a bit and realized that I didn't do a very good job setting up rules, so I added pyzor and razor2, and they seem functional. Spam got better, and it's down to maybe 10 a day, but they still range all the way up to 5. What really gets me is that if I take an email that scores -2, strip the X-Spam* headers, and run it through spamc by hand (even as the spamd user) just like the spamchk script does, it scores around a 4. I have one here that scores a 4.1 if it comes through the mail, and a 6.6 if I run it manually. What can I do to reconcile these scores? I would like the scores I'm getting from the commandline over the ones I'm getting through postfix, but I don't know the system well enough to know what is causing the difference. == Via postfix X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on kai2.gnukai.com X-Spam-Flag: YES X-Spam-Level: X-Spam-Status: Yes, score=4.1 required=3.0 tests=BAYES_60,HTML_IMAGE_RATIO_08, HTML_MESSAGE,INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,SPF_PASS autolearn=no version=3.3.1 ... Content analysis details: (4.1 points, 3.0 required) pts rule name description -- -- 1.1 INVALID_DATE Invalid Date: header (not RFC 2822) -0.0 SPF_PASS SPF: sender matches SPF record 0.0 HTML_IMAGE_RATIO_08BODY: HTML has a low ratio of text to image area 1.5 BAYES_60 BODY: Bayes spam probability is 60 to 80% [score: 0.6298] 0.0 HTML_MESSAGE BODY: HTML included in message 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts 0.8 RDNS_NONE Delivered to internal network by a host with no rDNS Via commandline (cat test.mail | sudo -u spamd /usr/bin/spamc -u > postsa.mail) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on kai2.gnukai.com X-Spam-Flag: YES X-Spam-Level: ** X-Spam-Status: Yes, score=6.6 required=3.0 tests=BAYES_60,HTML_MESSAGE, INVALID_DATE,MIME_HTML_ONLY,RDNS_NONE,SPF_PASS,URIBL_DBL_SPAM autolearn=no version=3.3.1 ... Content analysis details: (6.6 points, 3.0 required) pts rule name description -- -- 1.1 INVALID_DATE Invalid Date: header (not RFC 2822) -0.0 SPF_PASS SPF: sender matches SPF record 2.5 URIBL_DBL_SPAM Contains an URL listed in the DBL blocklist [URIs: fellage.me] 1.5 BAYES_60 BODY: Bayes spam probability is 60 to 80% [score: 0.6299] 0.0 HTML_MESSAGE BODY: HTML included in message 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts 0.8 RDNS_NONE Delivered to internal network by a host with no rDNS /etc/mail/spamassassin.cf (I added the last 4 lines in a desperate attempt to see something change, but to no effect) /etc/mail/spamassassin/local.cf # These values can be overridden by editing ~/.spamassassin/user_prefs.cf # (see spamassassin(1) for details) # These should be safe assumptions and allow for simple visual sifting # without risking lost emails. required_hits 5.0 report_safe 1 rewrite_header Subject [***SPAM***] add_header all Status _YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_ trusted_networks 69.160.84.222 razor_config /etc/mail/spamassassin/.razor/razor-agent.conf pyzor_options --homedir /etc/mail/spamassassin auto_learn 0 use_razor2 use_dcc use_pyzor