config: failed to parse line
Occasionally I am seeing the following log lines, they don't seem to be fatal, but I'd like to know what they are so I can decide if I need to fix something: Sep 21 07:24:07 spamd2 spamd[7749]: config: failed to parse line, skipping, in "(no file)": x-train Sep 21 07:24:07 spamd2 spamd[7749]: config: failed to parse line, skipping, in "(no file)": x-days 7 I can't find these config variables set in /etc/spamassassin/* This line also come along at the same time: Sep 21 07:24:07 spamd2 spamd[7749]: config: SpamAssassin failed to parse line, no value provided for "use_bayes", skipping: use_bayes An odd line because my bayes is working, autolearning and classifying fine and my 'use_bayes' line has a '1' after it: local.cf:use_bayes 1 local.cf:bayes_auto_learn 1 local.cf:bayes_ignore_header Message-Id local.cf:bayes_ignore_header Delivered-To local.cf:bayes_ignore_header User-Agent local.cf:bayes_ignore_header In-Reply-To local.cf:bayes_ignore_header ReSent-Date local.cf:bayes_ignore_header ReSent-From local.cf:bayes_ignore_header ReSent-Message-ID local.cf:bayes_ignore_header ReSent-Subject local.cf:bayes_ignore_header ReSent-To local.cf:bayes_ignore_header Resent-Date local.cf:bayes_ignore_header Resent-From local.cf:bayes_ignore_header Resent-Message-ID local.cf:bayes_ignore_header Resent-Subject local.cf:bayes_ignore_header Resent-To local.cf:bayes_ignore_header X-Bogosity local.cf:bayes_ignore_header X-CRM114 local.cf:bayes_ignore_header X-Enigmail-Version local.cf:bayes_ignore_header X-Mailer local.cf:bayes_ignore_header X-MailScanner local.cf:bayes_ignore_header X-MailScanner-Information local.cf:bayes_ignore_header X-MailScanner-SpamCheck local.cf:bayes_ignore_header X-Mozilla-Status local.cf:bayes_ignore_header X-Mozilla-Status2 local.cf:bayes_ignore_header X-no-archive local.cf:bayes_ignore_header X-Original-To local.cf:bayes_ignore_header X-PerlMX-Spam local.cf:bayes_ignore_header X-Received-From-IP local.cf:bayes_ignore_header X-Sanitizer local.cf:bayes_ignore_header X-SA-Exim local.cf:bayes_ignore_header X-Scanned-By local.cf:bayes_ignore_header X-Sender local.cf:bayes_ignore_header X-Sequence local.cf:bayes_ignore_header X-Spam-Flags local.cf:bayes_ignore_header X-Spam-Level local.cf:bayes_ignore_header X-Spam-Score local.cf:bayes_ignore_header X-Spam-Status local.cf:bayes_ignore_header X-s.logic-spamassas-bar local.cf:bayes_ignore_header X-s.logic-spamassas local.cf:bayes_ignore_header X-Virus-Scanned local.cf:bayes_ignore_header X-Virus-Status local.cf:bayes_ignore_header X-Warning local.cf:bayes_store_module Mail::SpamAssassin::BayesStore::MySQL local.cf:bayes_sql_dsn DBI:mysql:bayes:dbw-pn local.cf:bayes_sql_username spamass local.cf:bayes_sql_password assmanspam local.cf:bayes_sql_override_username @GLOBAL local.cf:bayes_expiry_max_db_size 100 local.cf:bayes_learn_to_journal0 Thanks, micah
Bayes innodb problems
I was having problems with scalability with my bayes DB, so I read up on the mailing list and found that it was recommended to switch to the innodb storage engine because of the row-level locking (versus the table-level locking that comes with MyISAM). Sounds great. So I switched, and everything was fine for several days. Then today the load on the DB server shot up to 11-13 and spam processing has ground down to really slow. I'm seeing some incredibly long queries now in my slow-query log, such as: # Time: 070926 17:10:53 # [EMAIL PROTECTED]: spamass[spamass] @ [10.0.2.4] # Query_time: 758 Lock_time: 0 Rows_sent: 1 Rows_examined: 2205327 SELECT count(*) FROM bayes_token WHERE id = '4' AND ('1190846660' - atime) > '345600'; This seems really wrong Then queries such as the following taking at least 30 seconds: # Time: 070926 17:13:24 # [EMAIL PROTECTED]: spamass[spamass] @ [10.0.2.4] # Query_time: 30 Lock_time: 0 Rows_sent: 88 Rows_examined: 88 SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime FROM bayes_token WHERE id = '4' AND token IN (' ') I'm seeing in my spamd logs the following: Sep 26 17:17:52 spamd2 spamd[5479]: bayes: expire_old_tokens: child processing timeout at /usr/sbin/spamd line 1246. Sep 26 17:17:52 spamd2 spamd[1160]: prefork: child states: BB Sep 26 17:17:52 spamd2 spamd[1160]: prefork: server reached --max-children setting, consider raising it I've got my --max-children set to 50, and I'm hitting this because the DB is not responding fast enough. Did I hit some sort of tipping point with the tokens in my database, do I have too many or ... what is going on here? I have to turn off bayes because its too slow and this is sad because this adds a lot to the results. This is what I have configured: bayes_store_module Mail::SpamAssassin::BayesStore::MySQL bayes_sql_dsn DBI:mysql:bayes:dbw-pn bayes_sql_username spamassassin bayes_sql_password notthepasswd bayes_sql_override_username@GLOBAL # keep the database from getting too big: bayes_expiry_max_db_size 100 # no affect bayes_learn_to_journal 0 mysql settings related to innodb: # * InnoDB innodb_data_file_path = ibdata1:10M:autoextend # # Set buffer pool size to 50-80% of your computer's memory set-variable = innodb_buffer_pool_size=1250M set-variable = innodb_additional_mem_pool_size=20M # # Set the log file size to about 25% of the buffer pool size set-variable = innodb_log_file_size=313M set-variable = innodb_log_buffer_size=8M # innodb_flush_log_at_trx_commit=1 I'm using spamassassin 3.2.3 and mysql 5.0.45. Thanks, Micah
Re: Bayes innodb problems
* Alex Woick <[EMAIL PROTECTED]> [070927 02:14]: > Micah Anderson schrieb am 27.09.2007 02:20: > >> processing has ground down to really slow. I'm seeing some incredibly >> long queries now in my slow-query log, such as: > > Try an "optimize table " for each of the sa tables. You just > filled the database from scratch, so perhaps the counters/statistics do not > reflect the actual value distribution yet. Actually this bayes DB has been around for a few months, and has been built up over time. This does make me wonder what regular DB maintenance tasks should be performed on the bayes DB. It sounds like some people let the code auto-expire, while some run cron jobs to expire data? What are the benefits of each? Should I be running an optimize table every so often? >> # Time: 070926 17:10:53 >> # [EMAIL PROTECTED]: spamass[spamass] @ [10.0.2.4] >> # Query_time: 758 Lock_time: 0 Rows_sent: 1 Rows_examined: 2205327 >> SELECT count(*) >>FROM bayes_token >> WHERE id = '4' >> AND ('1190846660' - atime) > '345600'; > > More than 10 minutes for counting 2 mio rows is a bit long. You can try to > look what Mysql is doing all the time. Execute a "show full processlist" > from a mysql command line while the above query is running and look at the > "State" column. If a SA-initiated query is waiting for a lock and actually > doing nothing, you should see it there. You also see all the other queries > that are currently running at this point and may be hogging the database > server. Since I've adjusted the SQL query to use the index, I haven't seen this problem, so I can't look at the State column to see what is going on. This DB server isn't doing anything else, for any other database, so there was no possibility of other things hogging the resources on the server. > The database design and query design of Spamassassin is ok, even the > appearently non-indexable term "('1190846660' - atime) > '345600'", since > Mysql would not use the index on an optimized term anyway. Try an EXPLAIN > of this statement - Mysql will always use only the first half for lookup (4 > bytes) of the index, which covers only the id part. That is if I am optimizing... mysql> explain SELECT count(*) FROM bayes_token WHERE id = '4' AND ('1190846660' - atime) > '345600'; ++-+-+--+--+--+-+---++--+ | id | select_type | table | type | possible_keys| key | key_len | ref | rows | Extra| ++-+-+--+--+--+-+---++--+ | 1 | SIMPLE | bayes_token | ref | PRIMARY,bayes_token_idx2 | bayes_token_idx2 | 2 | const | 229946 | Using where; Using index | ++-+-+--+--+--+-+---++--+ >> innodb_flush_log_at_trx_commit=1 > > Use value 0 for more performance and a small sacrifice of safety. See the > comment in the default *.ini file: Mine doesn't have a comment... but looking at http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html does lead me to want to change this since I dont care about transaction-level ACID compliance with the bayes database, if I have issues with that DB, I can always restore the backup from the day before. Micah
reaching incoming connections queued max, what happens?
I was interested to find out what would happen if spamd was totally overloaded, so I set my --max-children=1 and --max-conn-per-child=1 and then started hitting spamd with spamc and -t timeout values to see what happens. Essentially, each connection (simultaneously generated) took 1 second longer than the previous and at the -t value the message was returned unscanned. I believe this is what could be expected. According to the spamd man page: -m number , --max-children=number This option specifies the maximum number of children to spawn. Incoming connections can still occur if all of the children are busy, however those connections will be queued waiting for a free child. Please note that there is a OS specific maximum of connections that can be queued (Try "perl -MSocket -e’print SOMAXCONN’" to find this maximum). This leads me to wonder what would happen if I hit my SOMAXCONN with incoming messages, would they not be queued up? The SOMAXCONN on my linux box appears to be 128. So to test, I did the following on my spamd server, and then restarted spamd: echo "5" > /proc/sys/net/core/somaxconn I then issued 15 simultaneous connections with the -t value set to 15. Each individual connection took one second longer than the previous, as before, and those connections that took over the -t value were returned unscanned, as before. What puzzles me however is the fact that all of the connections acted just as before, when the SOMAXCONN was set to 128. Since each connection is coming in at exactly the same time I would expect that the first one would get accepted by spamd, 5 connections would be queued up, and then the 6+ connections would not be queued up and something would happen, but it doesn't, its the same as before... odd. These are the time results for each individual spamc connection: 1. real0m1.267s 2. real0m2.567s 3. real0m3.986s 4. real0m5.178s 5. real0m6.461s 6. real0m7.914s 7. real0m9.090s 8. real0m10.361s 9. real0m11.738s 10. real0m13.377s 11. real0m15.033s -- returned un-scanned 12. real0m15.026s -- returned un-scanned Micah
Re: SSO's RHSBL
* Giampaolo Tomassoni <[EMAIL PROTECTED]> [071008 08:47]: > > -Original Message- > > From: ram [mailto:[EMAIL PROTECTED] > > Sent: Monday, October 08, 2007 5:30 PM > > > > On Mon, 2007-10-08 at 14:40 +0200, Giampaolo Tomassoni wrote: > > > I'm getting this stuff from named in my log files during message > > scanning. > > > > > > Oct 8 14:36:40 ns2 named[6541]: unexpected RCODE (SERVFAIL) > > > resolving '.xxx.blackhole.securitysage.com/A/IN': a.b.c.d#53 > > > Oct 8 14:36:40 ns2 named[6541]: unexpected RCODE (SERVFAIL) > > > resolving '.xxx.blackhole.securitysage.com/A/IN': a.b.c.d#53 > > > Oct 8 14:36:40 ns2 named[6541]: unexpected RCODE (SERVFAIL) > > > resolving '.xxx.blackhole.securitysage.com/A/IN': a.b.c.d#53 > > > > > > Is there any problem with securitysage.com? > > > > > > > the rhsbl has been down for months now > > Well, it may be, but I believe it is not more than a week I'm getting these > log entries. This is right, these error only started showing up last week in the logcheck logs of a system that was still setup to use that rhsbl. Does anyone have a legitimate reference about it being closed down? Micah
Re: Disabling URIDNSBL plugin
* Daryl C. W. O'Shea <[EMAIL PROTECTED]> [071019 14:59]: > Justin Kim wrote: >> I don't know what is causing my postfix server to defer messages couple of >> times daily. > >> By looking at the logs, I can only tell there is something that keeps one >> spam checking process running for 5~10 mins. > > Likely bayes auto expiry. Disable bayes_auto_expiry and do the expiries > via a cron job instead. Do you think running a bayes expire via cronjob is necessary if you are running a INNOdb based bayes DB (with this patch[1])? Also, if you postpone the bayes expire to instead run it via cron aren't you just making the expiration stack up and instead are delaying this condition until later (when the cron job runs) and for longer (because the expiration hasn't been run in a while)? Micah 1. http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5661
Re: spamd hangs at 100% cpu
I too have experienced strange hangs with spamc/spamd combos on my postfix box running maildrop/mailfilter. At first I was convinced it was my bayes DB because it was using MyISAM tables and these are slow and I'm doing a lot of mail. So I switched to InnoDB and then I was convinced that the problem had to do with table locking during SA auto-expire periods and as a result dug deep into the SA SQL and submitted a bug to enhance the query so it can use indexes[1]. Even after all this I was getting reports from people who received bounced messages from my server saying that the default maildrop timeout was reached (300 seconds) and as a result the message was considered as the user being over quota and was bounced back to the original sender. We run spamc with -t 100 and expected that this meant that after 100 seconds if the message wasn't returned from spamd, then we simply accepted the message without any spam scanning. However, it seemed like things were lasting far longer than 100 seconds (3x as long to hit the maildrop timeout) and so our theory was that -t wasn't working properly. Because of these incorrect bounces, this meant we were not delivering legitimate email, and so we turned off spamassassin and began digging deeper to try and determine what was causing this. I have spent hours devising and running tests to try and figure out what is causing this, and so far I cannot replicate it in a test environment. If you are interested in seeing my tests, and have any suggestions for other tests that could be run to determine what might be causing this, I am *very* interested. Please see my test page: https://we.riseup.net/riseup+mail/spam-timeout-tests Micah 1. http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5661 * Martin.Hepworth <[EMAIL PROTECTED]> [071019 02:03]: > Peter > > Get the latest ruleset for SA using sa-update, this works around an issue > with whois lookups. > > Only run a few RBL's - you're running them all and this will take some time. > > Running a local caching nameserver on the box will help as well. > > > -- > Martin Hepworth > Snr Systems Administrator > Solid State Logic > Tel: +44 (0)1865 842300 > > > -Original Message- > > From: Peter Fastré [mailto:[EMAIL PROTECTED] > > Sent: 19 October 2007 09:58 > > To: users@spamassassin.apache.org > > Subject: spamd hangs at 100% cpu > > > > Hello > > > > I have a severe problem with one of my mailservers. I'm using spamassassin > > 3.2.3 in combination with exim 4.66, and experience hanging spamd > > processes which consume all my server resources. > > I've searched these mailing lists, searched google, searched > > documentation, ... I found very much old posts of people experiencing the > > same problems, so I think it's a very common one. I tried different > > solutions: tracing the process (process doesn't do anything when it hangs > > - no trace output), clearing the bayes database (doesn't help), ... > > The problem is really urgent, because exim receives timeouts from spamd, > > and rejects the mails. > > I reduced the number of mails each spamd processes, to reduce the risk of > > hanging. Usually it hangs after having processed 2 or 3 mails. > > Now I've even reduced it to 1, the hangups are less often, but still > > there! I hope someone has a solution, or a clue to what I can do! > > I checked the log files and debug output, which is very consistent. The > > last thing all hanging processes do, is this: > > Oct 19 09:42:09 mail01 spamd[6072]: rules: ran uri rule __DOS_HAS_ANY_URI > > ==> got hit: "k" > > After this line in the log, the process hangs. > > > > For your reference: the full log file is here: > > http://peter.lunatis.be/temp/spamd.txt > > > > Regards > > > > Peter > > > > > > > > ** > Confidentiality : This e-mail and any attachments are intended for the > addressee only and may be confidential. If they come to you in error > you must take no action based on them, nor must you copy or show them > to anyone. Please advise the sender by replying to this e-mail > immediately and then delete the original from your computer. > Opinion : Any opinions expressed in this e-mail are entirely those of > the author and unless specifically stated to the contrary, are not > necessarily those of the author's employer. > Security Warning : Internet e-mail is not necessarily a secure > communications medium and can be subject to data corruption. We advise > that you consider this fact when e-mailing us. > Viruses : We have taken steps to ensure that this e-mail and any > attachments are free from known viruses but in keeping with good > computing practice, you should ensure that they are virus free. > > Red Lion 49 Ltd T/A Solid State Logic > Registered as a limited company in England and Wales > (Company No:5362730) > Registered Office: 25 Spring Hill Road, Begbroke, Oxford OX5 1RU, > United Kingdom >
Re: Disabling URIDNSBL plugin
* mouss <[EMAIL PROTECTED]> [071020 09:38]: > Micah Anderson wrote: > > Do you think running a bayes expire via cronjob is necessary if you are > > running a INNOdb based bayes DB (with this patch[1])? > > > > Also, if you postpone the bayes expire to instead run it via cron aren't > > you just making the expiration stack up and instead are delaying this > > condition until later (when the cron job runs) and for longer (because > > the expiration hasn't been run in a while)? > > > > > > doing it once a day at 3 AM is not like doing it when delivering mail. Unless you deliver mail 24 hours a day for people all over the world. Then 3am in one place is noon in another.
Re: posting thru gmane to this list and not getting bombarded
* [EMAIL PROTECTED] <[EMAIL PROTECTED]> [071119 10:01]: > N> PS: I post to this list using gmane. Is it possible to stop delivery > N> on my email address so that I can post but I do not receive the list > N> messages? > > http://www.google.com/[EMAIL PROTECTED] Can this information be added to http://wiki.apache.org/spamassassin/MailingLists ? Micah signature.asc Description: Digital signature
Re: posting thru gmane to this list and not getting bombarded
* Justin Mason <[EMAIL PROTECTED]> [071119 14:13]: > > Micah Anderson writes: > > * [EMAIL PROTECTED] <[EMAIL PROTECTED]> [071119 10:01]: > > > N> PS: I post to this list using gmane. Is it possible to stop delivery > > > N> on my email address so that I can post but I do not receive the list > > > N> messages? > > > > > > http://www.google.com/[EMAIL PROTECTED] > > > > Can this information be added to > > http://wiki.apache.org/spamassassin/MailingLists ? > > go for it! it's a wiki ;) I'd like to, but I haven't been able to get '[EMAIL PROTECTED]' to work, so it seems wrong to add that if its not functioning. micah
Re: posting thru gmane to this list and not getting bombarded
* François Rousseau <[EMAIL PROTECTED]> [071121 10:21]: > Maybe iI'm weird but... what is the point to posting to a mailing list > if you don't read it? You *do* read it, you just read it via GMANE, instead of via a mail reader. Some of us don't like to have our inboxes bombarded with mailing lists, or prefer not to filter mailing lists to specific mailboxes but instead isolate mailing list reading to a more comfortable medium which allows us the ability to reply occasionally. > > The second search result is the relevant one: > > > > You can do the equivalent (to turn off delivery) by un-subscribing > > from the user's list and subscribing to > > [EMAIL PROTECTED] . You need to send an email to [EMAIL PROTECTED] to get on this list. micah signature.asc Description: Digital signature
Low scores
I feel like a lot of pretty obvious spams are getting through my system with appallingly low scores. I'm starting to wonder if something may be wrong with my setup. Looking at what spam tests did fire, I'm frequently surprised that more rules didn't fire (obvious lotto scams and nigerian inheritance scams seem to slip right by) and that the score are surprisingly low... I'd expect satisfyingly high scores for some of these, but I'm not seeing them. I'm looking for people to have a look over these spams and give me some ideas of some possible areas for improvement (either score adjustments, configuration tweaks, plugins that I should try, etc.). The spams can be pulled from here: http://micah.riseup.net/spams Thanks for any ideas, micah
Re: Low scores
On Sat, 23 Feb 2008 18:52:01 -0800, Loren Wilton wrote: >> I'm looking for people to have a look over these spams and give me some >> ideas of some possible areas for improvement (either score adjustments, >> configuration tweaks, plugins that I should try, etc.). >> >> The spams can be pulled from here: http://micah.riseup.net/spams > > It appears to me you have just posted the body text for these spams. > Much of the spam catching is done off of the header information, so > knowing that would help. Check again, I posted the entire raw maildir message, which includes the headers. > Also, knowing which tests did and didn't hit on your system would give > us an idea what you might be missing. You can see which tests hit in the headers of these emails. > That said, do you use the SARE rules? There are a number of rules there > that help catch 419's. Yes, I am using the openprotect channel. micah
Re: Low scores
On Sun, 24 Feb 2008 02:15:24 +0100, Matthias Leisi wrote: > Micah Anderson schrieb: > > | [surprisingly low scores] > | The spams can be pulled from here: http://micah.riseup.net/spams > > Most (all?) of the samples are forwarded through some debian.org > mechanism. In order for blacklists to take full effect, you should > configure your trust path (trusted_networks etc) accordingly. My trusted_networks is set to: trusted_networks 202.12.162. trusted_networks 10.0. trusted_networks 10.8.0. The first is trusting everything in that IP space, which we control, the second is a private network, and the third is a private network. Am I specifying those incorrectly perhaps? I'm also short-circuiting on trusted-relay chained messages, using the following: meta SC_HAM (USER_IN_WHITELIST||USER_IN_DEF_WHITELIST|| USER_IN_ALL_SPAM_TO||NO_RELAYS||ALL_TRUSTED||USER_IN_BLACKLIST_TO|| USER_IN_BLACKLIST) priority SC_HAM -1000 shortcircuit SC_HAM ham score SC_HAM -20 But I log in the headers all short-circuit status, with the following (and you wont see short-circuiting in the examples i posted): status add_header all Status "_YESNO_, score=_SCORE_ required=_REQD_ tests=_TESTS_ shortcircuit=_SCTYPE_ autolearn=_AUTOLEARN_ version=_VERSION_" Do I have something misconfigured in my trust path? I do have a forward from a debian.org email address that occasionally sends me legit email (although it does seem like a lot of spam gets through there), but I dont believe I have that domain in a whitelist anywhere. thanks micah
Re: Low scores
* Michael Scheidell <[EMAIL PROTECTED]> [080223 13:46]: > > I feel like a lot of pretty obvious spams are getting through my system > > with appallingly low scores. I'm starting to wonder if something may be > > wrong with my setup. Looking at what spam tests did fire, I'm frequently > > surprised that more rules didn't fire (obvious lotto scams and nigerian > > inheritance scams seem to slip right by) and that the score are > > surprisingly low... I'd expect satisfyingly high scores for some of > > these, but I'm not seeing them. > > You using any SARES' rules? If you have the cpu cycles, try that. Also make > sure you have latest SpamAssassin and are also running sa-update. If you > use sa-compile, make sure you run it every time you update rules. I'm running version 3.2.3-0.volatile1 on Debian etch (it supposedly has a number of backported fixes from 3.2.4). I run sa-update every night on two channels: saupdates.openprotect.com (which contains the recommended rules in the SARE), and updates.spamassassin.org. If there is an update, I run sa-compile and then restart spamassassin. Micah
Segfaulting when using compiled rules
my spamd is segaulting when I start it up. I tried to strace the process and all I could see was that it was opening this file and then doing some memory mappings and then segfaulting: open("/var/lib/spamassassin/compiled/3.002003/auto/Mail/SpamAssassin/CompiledRegexps/body_0/body_0.so",O_RDONLY) = 8 Sine this is a compiled rule... I tried to remove everything under /var/lib/spamassassin/compiled and then re-run sa-compile (after doing a sa-update), which succeeded fine, but as soon as I started up spamassassinbut it still segfaults. So I turned off rule complation now and it starts fine, but I'm wondering what I can do to fix this. I'm running 3.2.3 from volatile, and am running these channels: sa-update --gpgkey D1C035168C1EBC08464946DA258CDB3ABDE9DC10 --channel saupdates.openprotect.com --channel updates.spamassassin.org Thanks for any ideas, Micah
Re: two databases
Michael Grant writes: > I did not realize one could store the bayes scores in sql. > > So I'd store the bayes scores on a third server and let both mxes use > the same database. I did this, but my bayes in mysql and pointed two different spamd machines at it, but I had severe problems that I could not resolve. I posted to the list[0] about the problems. The basic problem was that as soon as I fired up the second server it immediately starts blocking on the bayes work. Average scantimes go from 1-2 seconds up to 35+ and the max children get eaten up by blocking on the bayes work to the point where its pointless because too many processes are blocked. Disabling the bayes_sql stuff on one of the machines dropped the scantimes back to their expected average of 1-2 seconds (but of course none of the BAYES tests will fire and autolearning fails). My mysql server is its own machine, it was local to the first spamd (local LAN) and remote to the second (over the net). I eliminated any hostname lookup problems, obviously couldn't eliminate network latency, but that shouldn't have caused such a severe result. I'm running with InnoDB tables, so I shouldn't have any row-level locking issues... in any case I might have had some issues because my MySQL database needed to be optimized, but I was not able to determine how and now I just run one of the spamd's without bayes, which is not too bad because my bayes database seems to be totally worthless at the moment. :P micah 0. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/113673
Bayes learning trusted networks mailing list email
I get a significant amount of spam that comes through mailing lists that I am legitimately subscribed to, either they are the administration emails asking me if I want to approve the "email" or not, or they are messages that make it through the list. These messages are either hitting ALL_TRUSTED, because they come from mailing lists on my networks, or are tagged with a clear untrusted-relays list. In otherwords, I've got my trusted_networks setup so that SA knows about networks that I trust to be sending legitimate email (they are not spam originators), but obviously spam gets through, but the spam comes from hops previous to these networks. If I understand things properly, because I've got these setup in my trusted_networks, then these previous hops will be checked in RBLs, so the spam is more detectable. For example, the debian servers do send some spam to me, but the Received: headers in the emails are correct, so if the server's address is in trusted_networks, then SA will look up the address debian got the email from in RBLs. What I am unsure of is if I am poisoning my bayes by reporting these messages that make it through as spam. Should I be just deleting them? The tokens that are legitimate that will end up as collateral damage are going to be the list footers, the list administration messages, and potentially other pieces. I'm hoping I can identify why my bayes database is so bad (it thinks everything is BAYES_00 now), and if this is why I will want to change my training behavior. thanks, micah
FreeMail.bl installation instructions
The FreeMail.pm installation instructions are a little thin: ### Install: # # Please add loadplugin to init.pre (so it's loaded before cf files!): # # loadplugin Mail::SpamAssassin::Plugin::FreeMail FreeMail.pm My understanding, and please correct me if I am wrong, is that you actually need to do this: # 1. Install FreeMail.pm in /etc/spamassassin # # 2. Add the following loadplugin to init.pre: # # loadplugin Mail::SpamAssassin::Plugin::FreeMail FreeMail.pm # # 2. Download http://sa.hege.li/FreeMail.cf to /etc/spamassassin # # 3. Download http://sa.hege.li/freemail_domains.cf to /etc/spamassassin I knew about the FreeMail.cf because I've used SA plugins before, but I had no idea about the domain list. Might be good to make these instructions a little more explicit, so that others will also win. Micah
Re: two databases
* Michael Grant [2009-06-05 10:26-0400]: > On Fri, Jun 5, 2009 at 16:08, Micah Anderson wrote: > > Michael Grant writes: > > > >> I did not realize one could store the bayes scores in sql. > >> > >> So I'd store the bayes scores on a third server and let both mxes use > >> the same database. > > > > I did this, but my bayes in mysql and pointed two different spamd > > machines at it, but I had severe problems that I could not resolve. I > > posted to the list[0] about the problems. > > > > The basic problem was that as soon as I fired up the second server it > > immediately starts blocking on the bayes work. Average scantimes go from > > 1-2 seconds up to 35+ and the max children get eaten up by blocking on > > the bayes work to the point where its pointless because too many > > processes are blocked. Disabling the bayes_sql stuff on one of the > > machines dropped the scantimes back to their expected average of 1-2 > > seconds (but of course none of the BAYES tests will fire and > > autolearning fails). > > > > My mysql server is its own machine, it was local to the first spamd > > (local LAN) and remote to the second (over the net). I eliminated any > > hostname lookup problems, obviously couldn't eliminate network latency, > > but that shouldn't have caused such a severe result. I'm running with > > InnoDB tables, so I shouldn't have any row-level locking issues... in > > any case I might have had some issues because my MySQL database needed > > to be optimized, but I was not able to determine how and now I just run > > one of the spamd's without bayes, which is not too bad because my bayes > > database seems to be totally worthless at the moment. :P > > > > micah > > > > 0. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/113673 > > > > > > Wow. I did not get around to setting this up yet. But on the MySQL > front, did you try enabling the query cache by adding this to the > mysql command line? > > --maximum-query_cache_size=1M I presume this setting is the same in my.cnf: query_cache_limit = 1048576 I dont recall all the things I tried, but it seems worth trying again, this time with a fresh approach. > Also, a tool I used a lot to help debug this sort of issue was mytop. I've never had too much luck with mytop, but I have found the tuning-primer.sh to work well: http://www.day32.com/MySQL/ micah signature.asc Description: Digital signature
Compiling with tcc, cannot start: segfaults
I chased this around for a while and when I finally determined the cause, I figured I should post something so that future searchers will find it. I have been happily running 3.2.3-0.volatile1 (Debian) for months. Today I woke up to a lot of Spam in my INBOX, and spamassassin down. It seems to have died during the cron sa-update process, so I try to start it up again and I'm unable to start spamd, it segfaults when I do: Starting SpamAssassin Mail Filter Daemon: /etc/init.d/spamassassin: line 38: 11186 Segmentation fault start-stop-daemon --start --pidfile $PIDFILE --exec $XNAME $NICE --oknodo --startas $DAEMON -- $OPTIONS $DOPTIONS Those options come from the Debian initscript, if I unpack them and run it manually: # /usr/sbin/spamd OPTIONS="-i -u nobody -A 10.0.1.13,10.0.1.15,10.0.1.17,10.0.1.31,10.0.1.33,10.0.1.44 -q -x --max-children 50 --helper-home-dir /etc/spamassassin" Segmentation fault Even without all the options: # /usr/sbin/spamd Segmentation fault In fact, if I try to sa-compile, I get a segfault, if I purge the 3.002003 rules (and their compiled versions), re-run sa-update and then sa-compile and then try to start spamassassin again, it segfaults If I strace the process, the end is as follows: stat64("/var/lib/spamassassin/compiled/3.002003/Mail/SpamAssassin/CompiledRegexps/body_0.pmc", 0xbfa315ac) = -1 ENOENT (No such file or directory) stat64("/var/lib/spamassassin/compiled/3.002003/Mail/SpamAssassin/CompiledRegexps/body_0.pm", {st_mode=S_IFREG|0444, st_size=58745, ...}) = 0 open("/var/lib/spamassassin/compiled/3.002003/Mail/SpamAssassin/CompiledRegexps/body_0.pm", O_RDONLY|O_LARGEFILE) = 7 ioctl(7, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfa312c8) = -1 ENOTTY (Inappropriate ioctl for device) _llseek(7, 0, [0], SEEK_CUR)= 0 read(7, "\npackage Mail::SpamAssassin::Com"..., 4096) = 4096 read(7, "razine\\b/i#,\n q#__DRUGS_DIET5# "..., 4096) = 4096 read(7, "SPUR-M\\b/i#,\n q#FB_SSEX# => q#/"..., 4096) = 4096 read(7, "#,\n q#__FRAUD_WNY# => q#/\\b(?:d"..., 4096) = 4096 read(7, "SOR# => q#/not a registered inve"..., 4096) = 4096 read(7, "a stud/i#,\n q#SARE_BETTERORG# ="..., 4096) = 4096 read(7, "|05 E(?:ast|\\.)? 85th St|10 S\\. "..., 4096) = 4096 read(7, " Blvd Suite 200|491 North Federa"..., 4096) = 4096 read(7, "RE_EN_N_800_5_1# => q#/800\\W+5(?"..., 4096) = 4096 read(7, " a|an? honest|you being a|to any"..., 4096) = 4096 read(7, " matter|mutual understanding|rel"..., 4096) = 4096 read(7, "U_PART_CIA# => q#/(?![\\s\"\'-][0-9"..., 4096) = 4096 read(7, " F X|A B S Y|H L U N|F C Y I|A M"..., 4096) = 4096 read(7, "> q#/\\bbuy\\b.{1,30}\\br(?:[EMAIL PROTECTED]|a"..., 4096) = 4096 read(7, "{0,40}account .{0,40}record/i#,\n"..., 4096) = 1401 brk(0x9c48000) = 0x9c48000 stat64("/var/lib/spamassassin/compiled/3.002003/auto/Mail/SpamAssassin/CompiledRegexps/body_0", {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0 stat64("/var/lib/spamassassin/compiled/3.002003/auto/Mail/SpamAssassin/CompiledRegexps/body_0/body_0.so", {st_mode=S_IFREG|0555, st_size=1015528, ...}) = 0 stat64("/var/lib/spamassassin/compiled/3.002003/auto/Mail/SpamAssassin/CompiledRegexps/body_0/body_0.bs", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 open("/var/lib/spamassassin/compiled/3.002003/auto/Mail/SpamAssassin/CompiledRegexps/body_0/body_0.so", O_RDONLY) = 8 read(8, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\\\0"...,512) = 512 fstat64(8, {st_mode=S_IFREG|0555, st_size=1015528, ...}) = 0 mmap2(NULL, 1018080, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8,0) = 0xb77a8000 mmap2(0xb789, 69632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xe7) = 0xb789 mprotect(0xbfa31000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC|PROT_GROWSDOWN) = 0 close(8)= 0 mprotect(0xb77a8000, 950272, PROT_READ|PROT_WRITE) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ Process 16329 detached So what was the cause? It turned out, I was trying to be smart and save disk space by installing the 'tcc' compiler on all of our spam processing servers. 'tcc' is known as 'the tiny C compiler', its small, fast and ANSI C compliant. Its somewhat experimental, and as such when I replaced it with gcc, blew away my compiled rules and re-ran sa-compile, things were able to start up again fine. Micah
Problems with 3.2.5
I just upgraded to 3.2.5 and have encountered some regressions. First, I'm getting tons of the following in my logs, literally metric tons: Sep 11 17:11:28 spamd2 spamd[27357]: Use of uninitialized value in concatenation (.) or string at /usr/share/perl5/Mail/SpamAssassin/Plugin/Check.pm line 1028, line 315. In order to get it to stop, I had to disable the shortcircuit plugin in v320.pre. I filled a partition with this line in a couple minutes flat. I particularly value the savings I get from this plugin, so I would like to know how I can re-enable it! This problem is also present in 3.2.4, but not in 3.2.3, if that helps. Additionally, I am getting the following: Sep 11 20:25:41 spamd2 spamd[26599]: DNS query timeout for gamma._domainkey.gmail.com Sep 11 20:16:02 spamd2 spamd[21923]: Compilation failed in require at /usr/lib/perl5/Net/DNS/RR/TXT.pm line 11, line 78. Sep 11 20:16:02 spamd2 spamd[21923]: BEGIN failed--compilation aborted at /usr/lib/perl5/Net/DNS/RR/TXT.pm line 11, line 78. These are obviously related to domainkeys/dkim, but the perl errors are ugly. Thanks for everyone's work on SA, its really appreciated, Micah
Phishing rules?
I keep getting hit by phishing attacks, and they aren't being stopped by anything I've thrown up in front of them: postfix is doing: reject_rbl_client b.barracudacentral.org, reject_rbl_client zen.spamhaus.org, reject_rbl_client list.dsbl.org, I've got clamav pulling signatures updated once a day from sanesecurity (phishing, spam, junk, rogue), SecuriteInfo (honeynet, vx, securesiteinfo) and Malware Black List, MSRBL (images, spam). I've got spamassassin 3.2.5 with URIBL plugin loaded (which I understand pulls in the 25_uribl.cf automatically, right? Or do I need to configure that? if its automatic, that pulls in SURBL phishing). I've got Botnet setup, PDFinfo and postcards, i'm using DCC and a bayesdb, i've got the hashcash, and SPF plugins loaded, imageinfo, pretty much everything I can think ofbut for some reason phishing attempts keep getting through. Sadly, I do not have an example I can share at the moment, as I typically delete them in a rage after training my bayes filter on them. However, I am looking for any suggestions of other things I can turn on... in particular, are there rules that people have created that look for certain keywords where the body is asking for your account/password information? Thanks for any ideas, micah
Re: Phishing rules?
* Kelson <[EMAIL PROTECTED]> [2008-10-30 17:29-0400]: > Micah Anderson wrote: >> reject_rbl_client list.dsbl.org, > > DSBL has shut down, and you should remove the query from your list. It > won't help with the phishing, but it'll free up some network resources. > Info: http://dsbl.org/node/3 Thanks, I wasn't aware of that. I'm only using zen.spamhaus now, which is a shame. I had to remove barracuda because I've received already 3 complaints about false-positives, thats a real shame, because it was blocking about 3x as much as zen was. >> I've got clamav pulling signatures updated once a day from sanesecurity >> (phishing, spam, junk, rogue), SecuriteInfo (honeynet, vx, >> securesiteinfo) and Malware Black List, MSRBL (images, spam). > > Odd, ClamAV + SaneSecurty does a really good job here at blocking phish > before they even get to SpamAssassin. We call clamd through MIMEDefang, > then call SpamAssassin (also through MimeDefang) if a message passes. > > Have you verified that Clam is using the SaneSecurity signatures? How > are you calling ClamAV? Oh I'm certainly blocking phishing attempts via the SaneSecurity signatures, probably 200+ in the last hour alone. However, the phishing emails that are getting through are not known to their signature database, and in some case have been directly targetted at the domain I am managing. Thats why I am interested in rules that look for typical phishing emails. These emails are usually quite similar in their construction, so it seems like a good case for rules. micah
Re: Phishing rules?
* Jeff Chan <[EMAIL PROTECTED]> [2008-10-31 02:36-0400]: > On Thursday, October 30, 2008, 12:56:53 PM, Micah Anderson wrote: > > > I keep getting hit by phishing attacks, and they aren't being stopped by > > anything I've thrown up in front of them: > > [...] > > I've got spamassassin 3.2.5 with URIBL plugin loaded (which I understand > > pulls in the 25_uribl.cf automatically, right? Or do I need to configure > > that? if its automatic, that pulls in SURBL phishing). > > Increase the score on: > > URIBL_PH_SURBL > > The current SpamAssassin rules scoring process gives it an > artificially low score which is counterproductive IMO. If you > want to stop more phishing spams, consider increasing the score. Thanks, I will do so... however the phishing emails I am getting are of two types: . generalized phishes, which I would expect SURBL to be able to detect a large percentage of . targetted phishing to my domain where the phisher attempts to impersonate the 'admins' and ask for usernames/passwords. These I dont think will get hits on SURBL, because they are specific to my domain, and these are actually the more damaging because users are more likely to be fooled by something that is claiming to come from 'us'. Micah signature.asc Description: Digital signature
Re: Phishing rules?
Randy <[EMAIL PROTECTED]> writes: > Micah Anderson wrote: >> Sadly, I do not have an example I can share at the moment, as I >> typically delete them in a rage after training my bayes filter on >> them. However, I am looking for any suggestions of other things I can >> turn on... in particular, are there rules that people have created that >> look for certain keywords where the body is asking for your >> account/password information? >> > Report these and maybe they will add something that catches them. If > one wanted to, they can get any mail the want through your filters if > they are good and don't use things that trigger the rules. Report them where exactly? Here is an example one I received recently, note the hideously low bayes score on this one, caused it to autolearn as ham even, grr. >From [EMAIL PROTECTED] Fri Oct 31 20:00:45 2008 Return-Path: <[EMAIL PROTECTED]> X-OfflineIMAP-x792266711-4c6f63616c-494e424f58: 1225549253-0134941395044-v6.0.3 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on spamd2.riseup.net X-Spam-Level: X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW autolearn=ham version=3.2.5 Delivered-To: [EMAIL PROTECTED] Received: from mx1.riseup.net (unknown [10.8.0.3]) by cormorant.riseup.net (Postfix) with ESMTP id 58BFA19581F7 for <[EMAIL PROTECTED]>; Fri, 31 Oct 2008 20:00:40 -0700 (PDT) Received: from master.debian.org (master.debian.org [70.103.162.29]) by mx1.riseup.net (Postfix) with ESMTP id AA4465701D1 for <[EMAIL PROTECTED]>; Fri, 31 Oct 2008 20:00:39 -0700 (PDT) Received: from cat.cybersurf.net ([209.197.145.185] helo=cat.cia.com) by master.debian.org with esmtp (Exim 4.63) (envelope-from <[EMAIL PROTECTED]>) id 1Kw6j8-0003iT-Ix for [EMAIL PROTECTED]; Sat, 01 Nov 2008 03:00:38 + Received: from reef.cybersurf.com ([209.197.145.198]) by cat.cia.com with esmtp (Exim 4.50) id 1Kw6iz-0002Li-Pg; Fri, 31 Oct 2008 21:00:29 -0600 Received: from apache by reef.cybersurf.com with local (Exim 4.44) id 1Kw6j0-0006W5-UJ; Fri, 31 Oct 2008 20:00:30 -0700 Received: from 196-207-0-227.netcomng.com (196-207-0-227.netcomng.com [196.207.0.227]) by webmail.3web.com (IMP) with HTTP for <[EMAIL PROTECTED]>; Sat, 1 Nov 2008 14:00:30 +1100 Message-ID: <[EMAIL PROTECTED]> Date: Sat, 1 Nov 2008 14:00:30 +1100 From: WEBMAIL Help Desk <[EMAIL PROTECTED]> Reply-to: [EMAIL PROTECTED] Subject: WEBMAIL Help Desk MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit User-Agent: Internet Messaging Program (IMP) 3.2.1 X-Originating-IP: 196.207.0.227 To: undisclosed-recipients:; X-Virus-Scanned: ClamAV 0.94/8552/Fri Oct 31 18:14:36 2008 on mx1.riseup.net X-Virus-Status: Clean Status: RO Content-Length: 1427 Lines: 38 Dear Webmail User, This message was sent automatically by a program on Webmail which periodically checks the size of inboxes, where new messages are received. The program is run weekly to ensure no one's inbox grows too large. If your inbox becomes too large, you will be unable to receive new email. Just before this message was sent, you had 18 Megabytes (MB) or more of messages stored in your inbox on your Webmail. To help us re-set your SPACE on our database prior to maintain your INBOX, you must reply to this e-mail and enter your Current User name () and Password( ). You will continue to receive this warning message periodically if your inbox size continues to be between 18 and 20 MB. If your inbox size grows to 20 MB, then a program on Bates Webmai will move your oldest email to a folder in your home directory to ensure that you will continue to be able to receive incoming email. You will be notified by email that this has taken place. If your inbox grows to 25 MB, you will be unable to receive new email as it will be returned to the sender. After you read a message, it is best to REPLY and SAVE it to another folder. Thank you for your cooperation. WEBMAIL Help Desk --- 3webXS HiSpeed Dial-up...surf up to 5x faster than regular dial-up alone... just $14.90/mo...visit www.get3web.com for details
Re: Phishing rules?
Karsten Bräckelmann <[EMAIL PROTECTED]> writes: > On Thu, 2008-10-30 at 15:56 -0400, Micah Anderson wrote: >> I keep getting hit by phishing attacks, and they aren't being stopped by >> anything I've thrown up in front of them: >> >> postfix is doing: >> reject_rbl_client b.barracudacentral.org, >> reject_rbl_client zen.spamhaus.org, >> reject_rbl_client list.dsbl.org, >> >> I've got clamav pulling signatures updated once a day from sanesecurity >> (phishing, spam, junk, rogue), SecuriteInfo (honeynet, vx, >> securesiteinfo) and Malware Black List, MSRBL (images, spam). > > I'd increase this, at least for the SaneSecurity phish sigs. They are > being updated much more frequently. Thanks for the pointer. For some reason I thought I had read on the SaneSecurity site that you shouldn't pull more than once a day, but now after you mentioned it I went and read again and they ask you dont pull more frequently than once an hour... so I've changed that cronjob, that should help. >> I've got spamassassin 3.2.5 with URIBL plugin loaded (which I understand >> pulls in the 25_uribl.cf automatically, right? Or do I need to configure > > Yes, unless you disable network tests in general. Should be easy to > answer yourself if they are working, just by grepping for the rule names > defined in 25_uribl.cf. Network tests aren't disabled, and yeah I am seeing those rules occur in some of my headers of mail that I can search through, so I think that they are working. I've increased my overall URIBL scoring to 2.5 from the default. >> Sadly, I do not have an example I can share at the moment, as I >> typically delete them in a rage after training my bayes filter on >> them. However, I am looking for any suggestions of other things I can >> turn on... in particular, are there rules that people have created that >> look for certain keywords where the body is asking for your >> account/password information? > > So you've pretty much thrown everything at it you could find... ;) And > they are still slipping through? How many are we talking here? Compared > to the total number of spam / phish? > > Also, how many are being caught? Strikes me as odd that you don't have a > sample but yet sound like every single one is slipping by. These are hard for me to answer as I am not doing any analysis of how many are caught. In the last week, I've gotten four of them through, and I've received reports from a number of users that they too have received them. I've just sent a sample to the list however. > I guess, I would start verifying that all the above actually is working. > Most notably the SaneSecurity phish sigs. ClamAV should catch the lions > share, by far, assuming it comes before SA in your chain. Yeah, I'm using the clamav-milter, so those get rejected really early on. Thanks for the ideas, Micah
Re: Phishing rules?
Joseph Brennan <[EMAIL PROTECTED]> writes: > Micah Anderson <[EMAIL PROTECTED]> wrote: > >> I keep getting hit by phishing attacks, and they aren't being stopped by >> anything I've thrown up in front of them: > > Do you mean attempts to get your users to send their passwords, > or fake mail pretending to be from banks? I mean attempts to get my users to send their passwords, are these not called phishing? micah
Re: Phishing rules?
Brent Clark <[EMAIL PROTECTED]> writes: > Hiya > > See SA examples > > http://wiki.junkemailfilter.com/index.php/Spam_DNS_Lists > > Also add hostkarma.junkemailfilter.com to you DNSBL. Thanks, I'll add this to my local.cf and see how it goes. > Another thing I do find is useful is adding additional higher valued > MX records. > > http://www.junkemailfilter.com/spam/support.html I dont really like the idea of adding some other site's MX to my DNS, so I think I'll pass on this one. thanks for the suggestions! micah
bayes SQL delays
I have spamd setup to use bayes in a mysql database, works fine. I've turned off auto-expiry and instead run a cronjob to expire in the middle of the night (removes about 40k tokens on a run). I've made the DB innoDB so it can handle locking better. I've got mysql-based user prefs coming from the same database server, and that works (not everyone wants bayes). Autolearning is working, I chew through a lot of mail every day, in general everything seems fine. Except that my spamd server is overloaded, so I need a second one. So I set up another spamd instance, with the exact same configurations as the first, fire it up and it immediately starts blocking on the bayes work. Average scantimes go from 1-2 seconds up to 35+ and the max children get eaten up by blocking on the bayes work to the point where its pointless because too many processes are blocked. If I disable the bayes_sql stuff in my local.cf, scantimes drop back to their expected average of 1-2 seconds, but of course none of the BAYES tests will fire and autolearning fails. What gives?
Re: Phishing rules?
Joseph Brennan <[EMAIL PROTECTED]> writes: >> Reply-to: [EMAIL PROTECTED] > > > First pass: > > header LOCAL_REPLYTO_LIVE Reply-to =~ /[EMAIL PROTECTED]/ > score LOCAL_REPLYTO_LIVE8.0 > > Maybe scoring 8.0 for one thing scares you, but I haven't seen this > fp in a couple of months. Is live.com a legitimate email sender? It looks microsoft related. If I set it to 8, then any mail from that address is surely to get caught as spam, which may not be the right thing depending on other potential legitimate addresses sending from that domain. Or perhaps nothing but spam comes from live.com? I dont know anything about it. micah
Re: Phishing rules?
SM <[EMAIL PROTECTED]> writes: > At 07:56 01-11-2008, Micah Anderson wrote: >>Here is an example one I received recently, note the hideously low bayes >>score on this one, caused it to autolearn as ham even, grr. > > [snip] > >>X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW >> autolearn=ham version=3.2.5 > > The sender is whitelisted by www.dnswl.org. Yeah, because this one was forwarded through debian.org, which is legitimate. The spam originator was not debian.org, but debian.org is the one in dnswl.org. >>Received: from master.debian.org (master.debian.org [70.103.162.29]) >> by mx1.riseup.net (Postfix) with ESMTP id AA4465701D1 >> for <[EMAIL PROTECTED]>; Fri, 31 Oct 2008 20:00:39 -0700 (PDT) > > The mail is coming through debian.org. Do you want to blacklist that host? No, I do not.
Re: Phishing rules?
Karsten Bräckelmann <[EMAIL PROTECTED]> writes: > On Sat, 2008-11-01 at 11:30 -0400, Micah Anderson wrote: >> Joseph Brennan <[EMAIL PROTECTED]> writes: > >> > Do you mean attempts to get your users to send their passwords, >> > or fake mail pretending to be from banks? >> >> I mean attempts to get my users to send their passwords, are these not >> called phishing? > > An important bit of information, missing from the OP. :) Targeted > attacks at your users, so the general phishing BLs don't really apply. > > Anyway, can't you educate your users, that > > (a) Any administrative email will be sent from an official, well known, > internal address? That means *not* an arbitrary address. Yes, sorry, > the obvious... > (b) They will *never* ever be asked for a password by mail. Period. > Again, obvious... We've been telling our users this for years, but there is always someone who doesn't listen, or forgets, or something. I dont know. I find it absolutely incredible that anyone would fall for any of these, yet I am the one who has to clean up the mess :P > Then block internal / administrative From addresses coming from any > external SMTP. Yeah, thats done, they dont get by faking our From, but the body is constructed in a way to mislead and impersonate our "staff" or whatever, usually by threatening people that their account will be closed, unless they reply. > This is not a technical way to stopping these, but an educational > approach to prevent the most dumb and gross social engineering. At least > the second one actually should be well-known, and I've seen ISPs > pointing it out frequently... Thanks, but we've done all these, and continue to do them, they are another plank in the various mechanisms that we must employ. micah
Re: Phishing rules?
Sahil Tandon <[EMAIL PROTECTED]> writes: > Joseph Brennan <[EMAIL PROTECTED]> wrote: > >>> We get some legitimate email from @live.com users. >> >> But they don't set a Reply-to header. That's the test. > > But that wasn't his question; he asked whether any legitimate mail flows > from live.com. That was my answer. :) You are technically correct, but Joseph's message made clear the information that I was not aware of, which was quite helpful and technically better. Micah
Re: Checking for SPF & DKIM Checks
Byung-Hee HWANG <[EMAIL PROTECTED]> writes: > mouss wrote: > [...] >> let's start with DKIM. >> >> do you have >> loadplugin Mail::SpamAssassin::Plugin::DKIM > > + i'm use with following rule ;; > score DKIM_VERIFIED -45.3 Even with the default DKIM scores, I finding I am getting spam that are DKIM_VERIFIED causing the score to dip below zero and let the message through, for example: http://micah.riseup.net/1 I am thinking of actually increasing the score because of this. micah
Re: Phishing rules?
Joseph Brennan <[EMAIL PROTECTED]> writes: > /Dear .{0,12}(web ?mail|columbia\.edu)/i > > /Password.{0,10}\([\s\.\*\_]+\)/ > > /you must reply to this email/i > > Reply-to =~ /[EMAIL PROTECTED]/ I created a meta-rule out of these (with a score of 8), and then ran spamassassin -D < phish to see how it worked, it matched the metarule flawlessly, but the phish ended up with only a 5.4 score due to BAYES_00 dragging it down. That was surprising to me, so I started to wonder if my bayes DB was poisoned. I ran some stats, and the results seem to indicate a healthy bayes database (unless I am reading this wrong)... A side note: its interesting to note how only 9% of our email is spam, which seems low, but maybe clamav-milter+rbls are blocking the remaining 40%? Email: 2379392 Autolearn: 1075396 AvgScore: -6.32 AvgScanTime: 5.96 sec Spam:227816 Autolearn: 114079 AvgScore: 14.75 AvgScanTime: 4.23 sec Ham:2151576 Autolearn: 961317 AvgScore: -8.56 AvgScanTime: 6.15 sec Time Spent Running SA: 3941.26 hours Time Spent Processing Spam: 267.76 hours Time Spent Processing Ham: 3673.50 hours TOP SPAM RULES FIRED -- RANKRULE NAME COUNT %OFMAIL %OFSPAM %OFHAM -- 1HTML_MESSAGE154522 54.03 67.83 52.57 2BAYES_991345316.09 59.050.48 3BOTNET 1336878.90 58.683.63 4RDNS_NONE 102255 10.19 44.886.51 5URIBL_JP_SURBL 98879 4.94 43.400.87 6MIME_HTML_ONLY 87518 7.62 38.424.36 7URIBL_OB_SURBL 76624 3.98 33.630.84 8DCC_CHECK 74600 8.51 32.755.94 9URIBL_AB_SURBL 59890 2.72 26.290.23 10URIBL_SC_SURBL 53911 2.51 23.660.27 11RCVD_IN_BL_SPAMCOP_NET 43120 2.43 18.930.68 12URIBL_WS_SURBL 38251 1.79 16.790.21 13URIBL_RHS_DOB 36565 2.17 16.050.70 14BAYES_5035322 3.93 15.502.71 15HTML_IMAGE_ONLY_16 33887 1.68 14.870.28 16HTML_SHORT_LINK_IMG_2 33118 1.56 14.540.19 17HTML_IMAGE_RATIO_02 32757 2.93 14.381.72 18URIBL_SBL 30456 1.80 13.370.57 19RAZOR2_CHECK27722 2.55 12.171.53 20RAZOR2_CF_RANGE_51_100 26856 2.41 11.791.41 -- TOP HAM RULES FIRED -- RANKRULE NAME COUNT %OFMAIL %OFSPAM %OFHAM -- 1BAYES_002002969 84.675.15 93.09 2HTML_MESSAGE1131073 54.03 67.83 52.57 3UNPARSEABLE_RELAY 760567 32.93 10.12 35.35 4DKIM_SIGNED 693328 29.746.26 32.22 5DKIM_VERIFIED 531590 22.673.38 24.71 6ALL_TRUSTED 1736127.300.058.07 7USER_IN_WHITELIST 1557046.540.007.24 8RDNS_NONE 140127 10.19 44.886.51 9DCC_CHECK 1278448.51 32.755.94 10RCVD_IN_DNSWL_LOW 1018634.310.344.73 11MIME_HTML_ONLY 93817 7.62 38.424.36 12RCVD_IN_DNSWL_MED 90038 3.810.314.18 13WHOIS_NETSOLPR 87575 3.720.384.07 14MIME_QP_LONG_LINE 82804 4.49 10.523.85 15BOTNET 78052 8.90 58.683.63 16BAYES_5058286 3.93 15.502.71 17FUZZY_AMBIEN53284 2.280.382.48 18SARE_SUB_ENC_UTF8 50533 2.140.172.35 19SARE_MILLIONSOF 42268 1.840.671.96 20FORGED_YAHOO_RCVD 38762 1.741.161.80 -- Then I looked to see what bayes did with the message, but I do not understand how to read the output, can someone explain this to me and give me an idea why BAYES_00 fired when we've been feeding every one of these spams to bayes to train on it? $ spamassassin -D bayes < phish [9595] dbg: bayes: using username: @GLOBAL [9595] dbg: bayes: database connection established [9595] dbg: bayes: found ba
Re: Funds / Award release scams poor scoring
* Justin Mason <[EMAIL PROTECTED]> [2008-11-10 05:30-0500]: > > John Hardin writes: > > On Sun, 9 Nov 2008, Micah Anderson wrote: > > > Does anyone have any rules to catch these, or suggestions of scores to > > > tweak to make these hit better? I am running clamav-milter with the > > > sanesecurity add-ons, but these are still making it through. > > > > Check out the sought-fraud ruleset. > > > > http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/jm/20_sought_fraud.cf > > > > (I don't know if it's in sa-update yet - Justin?) > > I thought it was, but it seems I never made that part of the publishing > process active ;) I'll do that. Does this mean it will show up in the regular updates.spamassassin.org channel? Or is there another that I should follow? Thanks! micah signature.asc Description: Digital signature
Re: Funds / Award release scams poor scoring
John Hardin <[EMAIL PROTECTED]> writes: > On Sun, 9 Nov 2008, Micah Anderson wrote: > >> Does anyone have any rules to catch these, or suggestions of scores to >> tweak to make these hit better? I am running clamav-milter with the >> sanesecurity add-ons, but these are still making it through. > > Check out the sought-fraud ruleset. > > http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/jm/20_sought_fraud.cf I am pulling the sought.rules.yerp.org channel, I thought that this was the same, but diff'ing these shows a lot of differences. > (I don't know if it's in sa-update yet - Justin?) Would be nice if I could pull these in via sa-update! micah
Re: Funds / Award release scams poor scoring
Chris <[EMAIL PROTECTED]> writes: > On Sunday 09 November 2008 2:33 pm, Micah Anderson wrote: > 2.5 CTYME_IXHASH BODY: iXhash found @ ixhash.junkemailfilter.com This one is interesting to me, when I pump these messages through spamc -R I get: -5.0 RCVD_IN_JMF_W RBL: Sender listed in JMF-WHITE [70.103.162.29 listed in hostkarma.junkemailfilter.com] Because I added the hostkarma.junkemailfilter RBLs, as described here: http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/113625 Getting -5 on these kind of sucks, but yours doesn't look like a RBL check, and is scoring it up. What test is that? > Above are how these scored on my stand-alone box. You may want to run the > Freemail plugin, SA-Grey plugin. Are you running Razor? The rest of my tests were the same as yours, with the exception of the Freemail and SA-Grey plugins, which I do not have. I'll track those down. I am running razor, the first message gets a + .5 from RAZOR2_CHECK, the 4th message gets 0.5 RAZOR2_CHECK + 1.5 RAZOR2_CF_RANGE_E4_51_100 + 0.5 RAZOR2_CF_RANGE_51_100 Micah
Funds / Award release scams poor scoring
I'm getting a number of these types of emails getting through SA with either negative scores, or very low scores. This is surprising to me as these are pretty classic spams. I suspect that some of the low scores are due being DKIM signed. Does anyone have any rules to catch these, or suggestions of scores to tweak to make these hit better? I am running clamav-milter with the sanesecurity add-ons, but these are still making it through. I here are 5 different ones, all that got through in the last 24 hours: http://micah.riseup.net/1 http://micah.riseup.net/2 http://micah.riseup.net/3 http://micah.riseup.net/4 http://micah.riseup.net/5 Thanks
Re: Phishing rules?
Joseph Brennan <[EMAIL PROTECTED]> writes: > /Dear .{0,12}(web ?mail|columbia\.edu)/i > > /Password.{0,10}\([\s\.\*\_]+\)/ > > /you must reply to this email/i > > Reply-to =~ /[EMAIL PROTECTED]/ I'm new at writing custom rules, so I am trying to figure out the best way to do this. Would it be better to make a different rule for each one of these, or would it be better to bmake a meta-rule? My guess is its better to make a meta-rule, but that means that each rule must hit in order to get the larger score, versus some of the individual rules hitting and adding up to the larger score. The meta-rule seems good because it describes a full profile phishing email that must be met, but it seems bad because one tweak of the phish would result in the meta-rule not matching overall. I suppose this is the point of the arthemetic meta-rule possibility, however I'm puzzled at the best mechanism to choose. Any advice would be appreciated. Once I figure out the best way to match these, I need a good way to determine what I should score these, the rule-writing documentation suggests starting at 0.1 and then moving it up as you test it, and suggests extreme caution scoring a custom rule over 1, however it seems like these would be better scored higher than that. > The first of course is partly local to us. Another useful local rule > is to check for the uri of your own webmail. Yeah, i'll make a uri rule for that and probably add that to the meta-rule. Thanks for any advice, micah
Hard money conference spam
I'm getting probably 4-5 of these a day, the messages vary, so they aren't the same, but they aren't firing on any specific rules related to their 'hard money conference/webinar/seminar' etc. Does anyone have any customized rules for these? I've been training my bayes on them, and its starting to pick them up (at BAYES_40 now), but it could use some more specific rules: Content analysis details: (5.1 points, 8.0 required) pts rule name description -- -- 0.0 FH_XMAIL_RND_833 Special X-Mailer Version -0.2 BAYES_40 BODY: Bayesian spam probability is 20 to 40% [score: 0.2305] 2.2 DCC_CHECK Listed in DCC (http://rhyolite.com/anti-spam/dcc/) 1.0 RCVD_IN_BRBL RBL: Received via relay listed in Barracuda RBL [66.29.0.197 listed in b.barracudacentral.org] 1.0 RCVD_IN_JMF_BR RBL: Sender listed in JMF-BROWN [66.29.0.197 listed in hostkarma.junkemailfilter.com] 1.1 URIBL_RHS_DOB Contains an URI of a new domain (Day Old Bread) [URIs: hardmoney-event.com] Return-Path: <[EMAIL PROTECTED]> X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on spamd1.riseup.net X-Spam-Level: *** X-Spam-Status: No, score=3.9 required=5.0 tests=FH_XMAIL_RND_833, RCVD_IN_JMF_BR,URIBL_BLACK,URIBL_RHS_DOB autolearn=no version=3.2.5 Delivered-To: [EMAIL PROTECTED] Received: from mx1.riseup.net (egret-vpn.riseup.net [10.8.0.3]) by cormorant.riseup.net (Postfix) with ESMTP id 602201C38CA8 for <[EMAIL PROTECTED]>; Mon, 10 Nov 2008 23:23:26 -0800 (PST) Received: from ip197.rutcommercial.com (ip197.rutcommercial.com [66.29.0.197]) by mx1.riseup.net (Postfix) with SMTP id 10F4757002B for <[EMAIL PROTECTED]>; Mon, 10 Nov 2008 23:23:10 -0800 (PST) Date: Tue, 11 Nov 2008 02:10:03 -0500 From: "Larry Rivera" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: thursday's hard money MIME-Version: 1.0 X-Mailer: oer v8.3.3.1000.10001079 Reply-To: [EMAIL PROTECTED] Message-Id: <[EMAIL PROTECTED]> Content-Type: text/plain; charset="iso-8859-1" X-Virus-Scanned: ClamAV 0.94/8607/Mon Nov 10 21:55:28 2008 on mx1.riseup.net X-Virus-Status: Clean Content-Length: 528 Hard Money National Event takes place on November 13th. follow the following steps to register: 1. Visit our website http://hardmoney-event.com 2. click "attend a seminar" and register for the event. 3. We will confirm your registration the same day. 4. call us at 858-736-7788 for additional information. If you wish to opt out of future messages, please go to http://hardmoney-event.com/uns/ or, send us a letter to PBMSII, 5580 la jolla blvd #153 La Jolla, Ca 92037 .
Re: SURBL Usage Policy change
"Jeff Chan" <[EMAIL PROTECTED]> writes: I think that SURBL is a valuable service, and I understand how it is difficult to maintain such a service without resources. > The funding is, by design, very moderate and will provide much needed > support to sustain this initiative. However, I believe that for non-profit organizations the funding model is not moderate at all. Perhaps this is because of the unfortunate decision to put non-profits into the same category as governments, which typically are able to bring in much larger amounts of money. Or perhaps it is a short-sighted view that non-profits all fall into the same category of large, well-funded non-profits. While there are some that do have resources available to them, a large majority of non-profits are deeply struggling with resources and honestly I cannot imagine any being able to afford the subscription rates that are listed for non-profits/governments. I'm on the board of directors and am an executive for three different non-profit organizations, and although they all would be eager to contribute to SURBL, none of them could possibly meet the funding bar that has been set. The SURBL FQS is great, and it is appreciated that you have thought of small charitable/non-profits with low email volume. However, I think you are missing that there are small charitable/non-profits that can do this volume on a extremely tight budget. Micah
Re: Hard money conference spam
Rob McEwen <[EMAIL PROTECTED]> writes: > Micah, > > In addition to the barracuda RBL, this IP is also listed on ivmSIP > (since 10/21/08) and ivmSIP/24 Can you provide me with the local.cf details to be able to add the ivm RBLs? > Additionally, the domain "hardmoney-event DOT com" is blacklisted on > both ivmURI and URIBL.COM > > At the very least, you should add uribl.com to your filtering since that > list is free. Scoring with URIBL for this would have easily put that > message "over the top" for you. I understood URIBL to be enabled by default in SA, and updated via sa-update, in fact I've got: /var/lib/spamassassin/3.002005/updates_spamassassin_org/25_uribl.cf > SHORT ANSWER: Start using uribl.com's URI blacklist Am I not using it already? Maybe I'm not, and the 25_uribl.cf doesn't include it? If so, I would really like to know about this. Thanks! Micah
Freemail config: dup unknown type freemail_re, Regexp
I recently added the FreeMail plugin, and although it appears to be working, when I start SpamAssassin, I receive this message in my log: Nov 11 06:45:48 spamd2 spamd[29934]: config: dup unknown type freemail_re, Regexp I've put the FreeMail.pm in /etc/spamassassin, and created FreeMail.cf as described, and it appears like it is working, as I am seeing some messages get tagged with it. Are the plugins that I am installing like this compilable regexps with sa-compile? Or do they stand separately? Thanks, micah
Re: Checking for SPF & DKIM Checks
mouss <[EMAIL PROTECTED]> writes: > Francis Russell wrote: >> >> Even with the default DKIM scores, I finding I am getting spam that are >> >> DKIM_VERIFIED causing the score to dip below zero and let the message >> >> through, for example: >> >> >> >> http://micah.riseup.net/1 >> > >> > that's spam relayed by a debian list. definitely a different beast... >> >> I interpret those headers as spam being sent to a Debian e-mail >> address, then forwarded to a personal address. That is a correct interpretation. I get most of my spam this way. > That's what I meant. Maybe I use the term "relay" too "liberally"? > anyway, such spam is harder to stop unless you add the list relays to > your trusted_networks. This is something in SA that I have the hardest time understanding, the trusted_networks and internal_networks settings. I've read all the posts that try to clarify it and I still can't keep it straight :) How would adding a list relay to my trusted_networks actually make stopping spam easier? Doesn't that make it a network that I should spend less time doing SA processing, because I 'trust' it? micah
Re: Barracuda RBL
"Sujit Acharyya-Choudhury" <[EMAIL PROTECTED]> writes: > Thanks Henrik. However, I am not using SVN 3.3 so the rule on its own > will be useful. I'm using: # Add a rule to give barracude RBL a +1 score, this is a really good # RBL, but we were having false-positives when using it to block at # the SMTP level, so using it in a weighted spamassassin rule is # better because we can benefit from it without being strict header RCVD_IN_BRBL eval:check_rbl('brbl-lastexternal', 'b.barracudacentral.org.', '127.0.0.2') describe RCVD_IN_BRBL Received via relay listed in Barracuda RBL score RCVD_IN_BRBL 1.0 tflags RCVD_IN_BRBL net micah
Overriding user prefs in local.cf
I set some 'add_header' options in my global local.cf and could not figure out why they were not being applied. It turns out that because I am using SQL user_prefs, any add_header lines I put in local.cf are just ignored (even though I have no global or individual add_header lines configured in my sql table). Is there any documentation that details which options that I might configure in local.cf that are overridden by user prefs simply existing? I know I can set a @GLOBAL pref with these add_header lines if I wish, and I can set them for my user, but I thought that by setting them in my local.cf they would be honored globally as well, as certain other things that are set there are honored globally. I'm not sure which are and which are not. micah
Re: Funds / Award release scams poor scoring
* Justin Mason <[EMAIL PROTECTED]> [2008-11-12 05:20-0500]: > > John Hardin writes: > > On Sun, 9 Nov 2008, Micah Anderson wrote: > > > > > Does anyone have any rules to catch these, or suggestions of scores to > > > tweak to make these hit better? I am running clamav-milter with the > > > sanesecurity add-ons, but these are still making it through. > > > > Check out the sought-fraud ruleset. > > > > http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/jm/20_sought_fraud.cf > > > > (I don't know if it's in sa-update yet - Justin?) > > That's in sa-update since last night; it's now bundled in the main > "sought" ruleset channel, as well. Which channels specifically? Do you mean to say that it is in both: updates.spamassassin.org sought.rules.yerp.org now? Thanks! Micah signature.asc Description: Digital signature
Re: Overriding user prefs in local.cf
Matt Kettler <[EMAIL PROTECTED]> writes: > Micah Anderson wrote: >> I set some 'add_header' options in my global local.cf and could not >> figure out why they were not being applied. It turns out that because I >> am using SQL user_prefs, any add_header lines I put in local.cf are just >> ignored (even though I have no global or individual add_header lines >> configured in my sql table). >> > That's strange. They should only be ignored if the user prefs contains a > clear_headers, or if it has an add_header for the exact same header. > > Does your user_prefs or global contain a clear_headers command? No, thats why I was confused as well. My global prefs don't exist in SQL at all, and my user prefs do not contain either an add_headers or clear_headers command. >> Is there any documentation that details which options that I might >> configure in local.cf that are overridden by user prefs simply existing? >> > There are none that are cleared simply by the merits of user_prefs > existing. An empty prefs is the same as no prefs. Ok, thats how I expected things to work, clearly something else is going on then. thanks, micah
hostkarma junkemailfilter
Over at another post about Phishing[0], Brent suggested setting up hostkarma.junkemailfilter to my RBL list, which I have done... However it seems to hit a lot of spams giving them a -5 scoring. I've either got this configured backwards, or this isn't working very well because it whitelists too much actual spam. I copied the examples[1] directly from their wiki... Does anyone have any experience with these? I'm removing the JMF-WHITE because its not helping at all, but I wonder if others have experience? header __RCVD_IN_JMF eval:check_rbl('JMF-lastexternal','hostkarma.junkemailfilter.com.') describe __RCVD_IN_JMF Sender listed in JunkEmailFilter tflags __RCVD_IN_JMF net header RCVD_IN_JMF_W eval:check_rbl_sub('JMF-lastexternal', '127.0.0.1') describe RCVD_IN_JMF_W Sender listed in JMF-WHITE tflags RCVD_IN_JMF_W net nice score RCVD_IN_JMF_W -5 header RCVD_IN_JMF_BL eval:check_rbl_sub('JMF-lastexternal', '127.0.0.2') describe RCVD_IN_JMF_BL Sender listed in JMF-BLACK tflags RCVD_IN_JMF_BL net score RCVD_IN_JMF_BL 3.0 header RCVD_IN_JMF_BR eval:check_rbl_sub('JMF-lastexternal', '127.0.0.4') describe RCVD_IN_JMF_BR Sender listed in JMF-BROWN tflags RCVD_IN_JMF_BR net score RCVD_IN_JMF_BR 1.0 0. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/113625 1. http://wiki.junkemailfilter.com/index.php/Spam_DNS_Lists micah
Re: Funds / Award release scams poor scoring
mouss <[EMAIL PROTECTED]> writes: > Henrik K wrote: >> On Mon, Nov 10, 2008 at 08:49:00AM +0100, mouss wrote: >>> Henrik K wrote: On Mon, Nov 10, 2008 at 12:25:42PM +0530, ram wrote: > The number of DNSWL_LOW and DNSWL_MED misfires have gone up especially > in last two days. Even Marc's JMF_W misfires. > > What it means is these are "good" mailservers who normally relay ham and > have some weak links ( weak password etc ) that just got exposed What method are they using to relay through master.debian.org? I can't figure out how these mail from yahoo etc can end up relaying through there in this case. >>> they simply post to the list. if the list is not open, they >>> susbcribe first. >> >> Ah right, I was looking it a bit wrong.. it's silly that the original >> recipient is nowhere to be found in headers. >> > > Now that you say it, I don't see any list headers! so it looks like a > bug somewhere... No, I receive email at [EMAIL PROTECTED], so it doesn't need to go through a debian list to get to me. micah
Distributing the processing load
Our poor spamassassin machine is not able to keep up with the mail load. We are constantly getting "prefork: server reached --max-children setting, consider raising it" errors, and our max-children are already set at the max that this machine can handle (50). Since we are using spamc/spamd I figured that it would be trivial to setup a second spamd on another machine and then the load could be split. I accomplished this by setting my mailfilter to use '-d spamd' and configured the spamd host in my DNS to be a round-robin between the two participating IPs. However, this seems to only work as a 'fail-over', and not a load-balancer, as the spamc man page says: If host resolves to multiple addresses, then spamc will fail-over to the other addresses, if the first one cannot be connected to. It will first try all addresses of one host before it tries the next one in the list. In fact, looking at my logs, one of the spamd machines is only processing requests for one of the three mail servers, the other requests are going to the other spamd. Likely this is because they all looked up the address, and then have it cached? I am using -x, and the man page says that the fail-over behaviour is incompatible with -x; if that switch is used, fail-over will not occur. Thats fine, I'm not particularly interested in fail-over, but rather load-balancing, is there any way to do this without having to setup my different mail servers to query different spamds? Thanks for any ideas, micah
Re: hostkarma junkemailfilter
"Benny Pedersen" <[EMAIL PROTECTED]> writes: > On Tue, November 18, 2008 22:16, Henrik K wrote: > > postfwd and trusted_networks msa_networks is what i do use here, then minimal > dns lookups is needed olso, facebook have random helo so need to be > whitelisted hard in postfwd and in spamassassin, i have contacted facebook > about it, but the problem might still be there > > i like your postfwd config Where is this postfwd config you refer to? I would like to see this. micah
Local rules math problem
I've got a couple custom meta rules, that don't seem to be applying how I expected them to. When I run a message that should hit on these rules I get: [14109] dbg: rules: ran one_line_body rule __LOCAL_PHISHER_USERNAME ==> got hit: "Username:" [14109] dbg: rules: ran one_line_body rule __LOCAL_PHISHER_PASSWORD ==> got hit: "Password:" [14109] dbg: rules: ran header rule __LOCAL_REPLYTO_NOTUS ==> got hit: "negative match" Which results in the rule: LOCAL_PHISH_FROMREPLY getting set with score 0.1, which is great, that is what I expect. However there is a rule that builds on that which doesn't fire, specifically the LOCAL_PHISHER_USERPASS rule which does the math to add the LOCAL_PHISH_FROM_REPLY to the __LOCAL_PHISHER_PASSWORD and __LOCAL_PHISHER_USERNAME to get over a score of 1, but even though those rules fire, the math addition doesn't seem to get over 1 and thus the meta rule doesn't fire... what am I missing here? body __LOCAL_PHISHER_PASSWORD /Password(.{0,10}\([\s\.\*\_]+\)|( .{0,4})?:)/i header __LOCAL_RETURN_PATH_ISUS Return-Path =~ /\...@ourdomain\.net/ header __LOCAL_FROM_ISUSFrom =~ /\...@ourdomain\.net/ header __LOCAL_REPLYTO_EXISTS exists:Reply-To header __LOCAL_REPLYTO_NOTUSReply-to !~ /\...@ourdomain\.net/ meta LOCAL_PHISH_FROMREPLY(( __LOCAL_RETURN_PATH_ISUS || __LOCAL_FROM_ISUS ) && ( __LOCAL_REPLYTO_EXISTS && __LOCAL_REPLYTO_NOTUS )) score LOCAL_PHISH_FROMREPLY 0.1 body __LOCAL_PHISHER_USERNAME /User(\s)?(n|N)ame(.{0,10}\([\s\.\*\_]+\)|( .{0,4})?:)/i meta LOCAL_PHISHER_USERPASS ((( 0.2 * __LOCAL_PHISHER_USERNAME ) + ( 0.4 * __LOCAL_PHISHER_PASSWORD ) + ( 0.4 * LOCAL_PHISH_FROMREPLY)) > 1) describe LOCAL_PHISHER_USERPASS Typical phish: asks for username and password, we dont do that score LOCAL_PHISHER_USERPASS10.5 thanks, micah
bayes training doesn't seem to have any affect
I got a phish message that was understood by bayes as: -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.] So I traiend with spamc -L spam but even after that I am still getting BAYES_00. Shouldn't the training have bumped that score up? Thanks for any info, micah
Re: bayes training doesn't seem to have any affect
Dave Walker writes: > Micah Anderson wrote: >> I got a phish message that was understood by bayes as: >> >> -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1% >> [score: 0.] >> >> So I traiend with spamc -L spam but even after that I am still getting >> BAYES_00. Shouldn't the training have bumped that score up? >> >> Thanks for any info, >> > In order for Bayes to actually make a difference, it needs plenty of > training. It's disabled by default in most installs - unless you have > at least 200 of both spam and ham taught. This needs to be done > manually, unless you have autolearn enabled. Yeah, I've been running this bayes db for a couple years now, so I am sure I've passed the 200 mark :) I'm wondering if my bayes DB is too poisoned now and maybe needs to be reset? > To see what is really going on run "$ spamassassin -D < > /path/to/the/email > /dev/null", and see if you can learn anything as to > why it's not working as expected. Indeed, when I do this, I find these bayes related log entries: [13244] dbg: bayes: corpus size: nspam = 6798614, nham = 19136735 [13244] dbg: bayes: tok_get_all: token count: 175 [13244] dbg: bayes: score = 0 > Also, to see how experienced your Bayes knowledge is - use "$ sa-leanrn > --dump magic" This shows me that I have no idea what these magic things are :) Does this tell you anything useful? 0.000 0 3 0 non-token data: bayes db version 0.000 06798614 0 non-token data: nspam 0.000 0 19136753 0 non-token data: nham 0.000 0 1063157695 0 non-token data: ntokens 0.000 0 1241301616 0 non-token data: oldest atime 0.000 0 1241416889 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 1241344830 0 non-token data: last expiry atime 0.000 0 43200 0 non-token data: last expire atime delta 0.000 0 496607 0 non-token data: last expire reduction count micah
Re: bayes training doesn't seem to have any affect
Adam Katz writes: > Micah Anderson wrote: >>> Also, to see how experienced your Bayes knowledge is - use "$ sa-leanrn >>> --dump magic" >> >> This shows me that I have no idea what these magic things are :) Does >> this tell you anything useful? >> >> 0.000 0 3 0 non-token data: bayes db version >> 0.000 06798614 0 non-token data: nspam >> 0.000 0 19136753 0 non-token data: nham >> 0.000 0 1063157695 0 non-token data: ntokens >> 0.000 0 1241301616 0 non-token data: oldest atime >> 0.000 0 1241416889 0 non-token data: newest atime >> 0.000 0 0 0 non-token data: last journal sync >> atime >> 0.000 0 1241344830 0 non-token data: last expiry atime >> 0.000 0 43200 0 non-token data: last expire atime >> delta >> 0.000 0 496607 0 non-token data: last expire >> reduction count > > Eh? Last journal sync atime is Jan 1 1970? > Try running: sa-learn --sync Doesn't seem to change the 'last journal sync atime' from 0. > If that helps, put it in your nightly SpamAssassin cron job > (and/or revisit your custom teaching scripts). In fact, I've been running that from cron every night. I'm using a mysql DB and I've got the following set in my local.cf: # We want to expire via cronjob, rather than having one of our spamd # children do it. bayes_auto_expire 0 # no affect bayes_learn_to_journal 0 > A quick primer (since this doesn't really exist anywhere...): The > three zeroed columns are always zero. > > bayes db version is self-explanatory. > nspam is the number of spam messages on record. bayes needs >200. Should be fine: 6798649 > nham is the number of ham messages on record. bayes needs >200. Also should be fine: 19160960 > ntokens is the number of 'words' noted in the system. lots of tokens: 1065483803 > oldest atime is the oldest access time of the oldest token (I think). I've got 1241474416 which would be Mon May 4 15:00:16 PDT 2009 which is just yesterday... that doesn't seem right that this would be the oldest access time, especially for 1065483803 tokens! > the rest of the times should be self-explanatory. > last expire reduction count is the number of tokens removed from the > last expiration run (I think). Ok, that seems to be counting, so something is being expired: 0.000 0 840628 0 non-token data: last expire reduction count This is all very interesting info, I appreciate the explanation. However, my original question still stands. micah
Re: bayes training doesn't seem to have any affect
Karsten Bräckelmann writes: >> This shows me that I have no idea what these magic things are :) Does >> this tell you anything useful? > >> 0.000 06798614 0 non-token data: nspam >> 0.000 0 19136753 0 non-token data: nham > > That's quite a lot of ham compared to the spam... Does that really > reflect your mail instream? I would suspect not, since we probably get more spam than non-spam. However, perhaps the spamassassin autolearning caused this? Perhaps the DB is so out of whack, I should just reset it from scratch and try it again. Its a lot of data to loose and I am not sure exactly the right way to do that... so I'd be somewhat reluctant to do so. Might be better if I could clean it out some. > 19 M hams learned and an SQL Bayes storage backend. Site wide. Do you > trust your users? Any chance some of them are training badly? At worst No, I don't trust my users. In fact because of that we moved from doing site-wide training to selected users who can demonstrate that they understand how to train. Perhaps these numbers are legacy from before we switched to this method. thanks, micah
Re: Low scores
On Tue, 9 Mar 2010 11:56:56 -1000, Julian Yap wrote: > Just wanted to add that this particular line is incorrect: > meta SC_HAM (USER_IN_WHITELIST||USER_IN_DEF_WHITELIST|| > USER_IN_ALL_SPAM_TO||NO_RELAYS||ALL_TRUSTED||USER_IN_BLACKLIST_TO|| > USER_IN_BLACKLIST) > > That will have Blacklisted email filters classified as ham. Interesting, thanks for the reply from an old thread. I got this list from: http://wiki.apache.org/spamassassin/ShortcircuitingRuleset which seems to be something that Justin Mason put together. I have CC'd Justin on this email. This list specifies that this was a good shortcircuit rule to have first because these are non-network-based whitelists, locally-generated messages, messages via a trusted relay chain, simple non-network based blacklists. Mine now reads: meta SC_HAM (USER_IN_WHITELIST||USER_IN_DEF_WHITELIST||USER_IN_ALL_SPAM_TO||SUBJECT_IN_WHITELIST||NO_RELAYS||ALL_TRUSTED||USER_IN_BLACKLIST_TO||USER_IN_BLACKLIST||SUBJECT_IN_BLACKLIST) priority SC_HAM -1000 shortcircuit SC_HAM ham score SC_HAM -20 Which has the difference of also including "SUBJECT_IN_WHITELIST", and "SUBJECT_IN_BLACKLIST"... but now I am wondering if this is the right thing to do. I'm very curious about resolving this, it does seem like a bad setup and it is being taken as gospel from the spamassassin wiki, but perhaps there is something that we are not understanding here that Justin can clarify? micah pgpPzA62WWh7c.pgp Description: PGP signature
Re: Low scores
On Fri, 12 Mar 2010 15:44:21 -1000, Julian Yap wrote: > On Thu, Mar 11, 2010 at 7:58 AM, micah anderson wrote: > > > On Tue, 9 Mar 2010 11:56:56 -1000, Julian Yap > > wrote: > > > Just wanted to add that this particular line is incorrect: > > > meta SC_HAM (USER_IN_WHITELIST||USER_IN_DEF_WHITELIST|| > > > USER_IN_ALL_SPAM_TO||NO_RELAYS||ALL_TRUSTED||USER_IN_BLACKLIST_TO|| > > > USER_IN_BLACKLIST) > > > > > > That will have Blacklisted email filters classified as ham. > > > > Interesting, thanks for the reply from an old thread. > > > > I got this list from: > > http://wiki.apache.org/spamassassin/ShortcircuitingRuleset which seems > > to be something that Justin Mason put together. I have CC'd Justin on > > this email. > > Which has the difference of also including "SUBJECT_IN_WHITELIST", and > > "SUBJECT_IN_BLACKLIST"... but now I am wondering if this is the right > > thing to do. I actually removed the SUBJECT_IN rules as this makes it so any individual user who can whitelist/blacklist a subject can shortcircuit for everyone. > > I'm very curious about resolving this, it does seem like a bad setup and > > it is being taken as gospel from the spamassassin wiki, but perhaps > > there is something that we are not understanding here that Justin can > > clarify? > > > > I'm pretty sure yours is wrong. You need to take out the the rules which > apply to Spam in spam short circuiting. I agree with you, its amazing that this has been wrong on the wiki since 2007! I went to go update the wiki today, and found that you had just done it. Thanks for doing that! Micah pgpBuehAyiHwT.pgp Description: PGP signature
Botnet plugin still relevant?
Hi, I've been using the Botnet plugin version 0.8 for some time now, and the plugin itself has been around since 2003 or so. I'm just curious to test the waters and see what other's think about the relevance in 2010 of this plugin. Does it still contribute in positive ways to your setup? I do not see a newer version of the plugin since 2007, is there a newer version than 0.8? Did you do any configuration of it beyond its defaults? Does the proliferation of individuals on dynamically assigned cable/dsl modems cause the plugin to misfire too often? I've had a number of complaints somewhat recently about the last point, and I don't have much of a solution to the situation where a user is stuck with the dynamically assigned IP that previously a spammer was occupying, except to explain that is the situation and eventually it will change. thanks for any thoughts or experiences with this plugin! micah ps. I notice it is not listed on http://wiki.apache.org/spamassassin/CustomPlugins and I wonder the reason why?
sa-update channels
I'm trying to find out what the current state of the art is for plugins and channel updates. What are people using now days? I just reviewed my plugins and ended up deleting Freemail because it has been pulled into Spamassassin core; removed the postcards plugin because the original source is now 404 and it is a very old rule; removed the iXhash plugin because it was spewing a lot of perl errors and I was not seeing a lot of hits. I've still got 20_saught_fraud, Botnet, and PDFinfo... but nothing beyond that. For channels I've been using: updates.spamassassin.org sought.rules.yerp.org saupdates.openprotect.com But I wonder if the last two are still relevant, or if there are other lists to use instead? Thanks for any advice, micah
Re: Botnet plugin still relevant?
On Wed, 17 Mar 2010 14:45:53 -0700, John Rudd wrote: > Some people need to put in some alternate values for DNS timeouts, but > if you've got a local caching name server, you typically don't need > that. > > There aren't any actual bugs in it that I'm aware of, so I haven't > released a new version. As I see it, there isn't a need (and that is > a somewhat controversial statement with some of the more opinionated > people around here). > > I do still see some things that get nailed by it ... but there's lots > of those same hosts that get caught by the Spamhaus PBL. So, it kind > of depends on what you're doing with PBL and/or Zen, as to whether or > not you need Botnet. But, there are still plenty of things coming > from that class of hosts, so if you don't use one, I'd definitely > recommend using the other. Yeah, I've been having problems recently which I think are related to me using both Zen/PBL along with the Botnet plugin weighted to score level 5, even if I were to have it lower at 3 it would still be too much. Many users are complaining and when I finally get some useful messages with headers to analyze I am finding something like the following: X-Spam-Report: * 3.3 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL * [213.6.61.151 listed in zen.dnsbl] * 1.0 RCVD_IN_BRBL RBL: Received via relay listed in Barracuda RBL * [213.6.61.151 listed in b.barracudacentral.org] * 1.4 RCVD_IN_BRBL_LASTEXT RBL: RCVD_IN_BRBL_LASTEXT * [213.6.61.151 listed in bb.barracudacentral.org] * 0.0 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address * [213.6.61.151 listed in dnsbl.sorbs.net] * 0.8 SPF_NEUTRAL SPF: sender does not match SPF record (neutral) * 5.0 BOTNET Relay might be a spambot or virusbot * [botnet0.8,ip=213.6.61.151,rdns=a61-151.adsl.paltel.net,maildomain=palnet.com,client,ipinhostname,clientwords] * 1.0 RDNS_DYNAMIC Delivered to internal network by host with * dynamic-looking rDNS This brings it over the 8 threshold, although it is a legitimate email From a user who has unfortunately been saddled with a dynamic IP that previously was used by a spammer. No amount of explanation to these users about this is going to assuage their feelings, and there isn't really anything that can be done by them. They can complain to their ISP I guess, they could also find another ISP, but these are not particularly productive steps towards resolving this problem. I'm interested in other suggestions that I offer people as alternatives, but until then I think I may need to remove Botnet from the equation. micah pgpOYcMscG6vB.pgp Description: PGP signature
meaning of child cleanup
Since upgrading to the new spamassassin, I'm seeing the following two log entries related to cleanup of child PIDs: 1. Apr 1 08:26:38 spamd2 spamd[396]: spamd: handled cleanup of child pid [31720] due to SIGCHLD: INTERRUPTED, signal 2 (0002) 2. Mar 28 18:00:15 spamd2 spamd[17562]: spamd: handled cleanup of child pid [391] due to SIGCHLD: exit 0 If I were to guess, the second one seems to be when things are acting right, the first one seems problematic, and I'm trying to determine what causes it. The logs for that process aren't particularly interesting, they are just like any others, with various prefork childstate entries: Mar 28 06:25:35 spamd2 spamd[396]: prefork: child states: II Mar 28 06:25:36 spamd2 spamd[396]: prefork: child states: IB but nothing particularly egregious looking. Can someone help me clarify what causes an INTERRUPTED signal? Should I worry about it? Should I ignore it in logcheck? thanks! micah -- "It is no measure of health to be well adjusted to a profoundly sick society." - J Krishnamurti
dcc: [26896] terminated: exit 241
I'm getting a lot of these log entries ever since I've upgraded: Apr 9 22:31:14 spamd2 spamd[2774]: dcc: [26896] terminated: exit 241 Obviously this is related to dcc, but I am not finding anything about what 'exit 241' is, and how I can adjust things so I no longer get them (or maybe they are normal and I need to start ignoring them?) Does anyone have a clue about these? thanks! micah -- "It is no measure of health to be well adjusted to a profoundly sick society." - J Krishnamurti
New log errors on upgrading
More new errors that I am getting from an upgrade to spamassassin 3.3: Use of uninitialized value $start_time in addition (+) at /usr/sbin/spamd line 1382, and also the following: spf: lookup failed: Can't locate object method "new_from_string" via package "Mail::SPF::Mech::All" at /usr/share/perl5/Mail/SPF/Record.pm line 227. I'm using libmail-spf-perl version: 2.005-1 Might this be fixed in a newer perl version? Micah
Re: dcc: [26896] terminated: exit 241
Michael Scheidell writes: > On 4/12/10 4:55 PM, Micah Anderson wrote: >> I'm getting a lot of these log entries ever since I've upgraded: >> >> Apr 9 22:31:14 spamd2 spamd[2774]: dcc: [26896] terminated: exit 241 >> >> > what version of dcc are you running? This is version '1.2.74-4' from Debian... but now looking closer, it seems as if dcc was removed after Debian Etch. It seems that it was removed because the upstream authors changed its license to non-free (according to Debian's DFSG) in version 1.30. This also means that it has not been available in Ubuntu either since Dapper. "The Distributed Checksum Clearinghouse source carries a license that is free to organizations that do not sell filtering devices or services except to their own users and that participate in the global DCC network. . . you may not redistribute modified, "fixed," or "improved" versions of the source or binaries. You also can't call it your own or blame anyone for the results of using it." So I guess I just will remove dcc, that is a shame, it seems like a good service. > what did you upgrade? Sorry, I upgraded from Debian etch to Debian Lenny, along with that came an upgrade to spamassassin. micah -- "It is no measure of health to be well adjusted to a profoundly sick society." - J Krishnamurti
Re: New log errors on upgrading
Mark Martinec writes: >> More new errors that I am getting from an upgrade to spamassassin 3.3: > > 3.3.0 ? Good question... indeed the version is 3.3.0. >> Use of uninitialized value $start_time in addition (+) at >> /usr/sbin/spamd line 1382, > > That was fixed in 3.3.1 . Great, I didn't see that in the changelog, but I'm sure it was. I will update before I bug you further about these! :) >> and also the following: >> >> spf: lookup failed: Can't locate object method "new_from_string" via >> package "Mail::SPF::Mech::All" at /usr/share/perl5/Mail/SPF/Record.pm >> line 227. >> >> I'm using libmail-spf-perl version: 2.005-1 >> >> Might this be fixed in a newer perl version? > > No idea. Try Mail-SPF-v2.007, the 2.005 is three years old. I am now running v2.007 to see if that fixes it, I suspect it will. If it does I will make sure the debian package gets that noted so others wont run into this. thanks for your answers, micah
spamc randomization
I'm using the --randomize option to spamc, along with the -d switch that has a hostname which resolves to multiple IP addresses. Does the --randomize get passed the full set of IPs that are resolved from the -d hostname and then it randomizes those IPs? In otherwords, you can have one host name (say 'spamd') which resolves to multiple IPs and then passed to the --randomize to be picked from? That seems to be how it is described, but I could be misinterpreting it. The description of the --randomize option in the man page which says, 'the IP addresses returned for the hosts given by the -d switch', and the -d switch says you can do this: If host resolves to multiple addresses, then spamc will fail-over to the other addresses, if the first one cannot be connected to. It will first try all addresses of one host before it tries the next one in the list. I'm also a little unclear what the --randomize man section means when it says, "it will try only three times though." Say the hostname 'spamd' resolves to four IP addresses: 192.168.1.2, 192.168.1.3, 192.168.1.4, 192.168.1.5. After -d resolve that hostname into those IPs, they are passed to the --randomize function, and one of those four is picked. The first one doesn't respond, so then it tries another one, that fails, it then tries a final one and then gives up (not trying all four)? Did I read this right? I appreciate any second eyes on my interpretation here. thanks, micah
Re: sa-update channels
Kai Schaetzl writes: > Micah Anderson wrote on Wed, 17 Mar 2010 18:20:40 -0400: > >> saupdates.openprotect.com > > It's been said repeatedly on this list: don't use it. Thanks, should I be using the sought.rules.yerp.org channel instead, or some of the dostech ones? micah
Re: How do I filter out phishing email?
Jari Fredriksson writes: > On 14.4.2010 18:57, yongke wrote: >> >> Well, we send emails on behalf of clients, and so we are trying catch >> phishing spam before they are sent out. Since the email aren't sent yet, we >> had to generate a mock email for SA. The header in the example is what we >> THINK the headers will be when they are actually sent out. >> >> When you tried it with your SA, I assume you didn't change any headers? If >> that's the case, then it should still work. I guess I didn't setup SA >> correctly? >> > > I did not change anything. And I think I have pretty default scores on > the rules. > > I have following rule sets in my channels: > > 90_2tld.cf.sare.sa-update.dostech.net In a previous thread[0], it was mentioned that you should not be using the above channel (or 90_3tld.cf) because these files have been merged into 3.3.1 and are released as 20_aux_tlds.cf micah 0. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/127703
Re: dcc: [26896] terminated: exit 241
Michael Scheidell writes: > On 4/15/10 5:35 PM, Micah Anderson wrote: >> M >> "The Distributed Checksum Clearinghouse source carries a license that is >> free to organizations that do not sell filtering devices or services >> except to their own users and that participate in the global DCC >> network. . . you may not redistribute modified, "fixed," or "improved" >> versions of the source or binaries. You also can't call it your own or >> blame anyone for the results of using it." >> > Which seems silly for debian to remove it, since many of the > blacklists in SA are by default, licensed similar (free for non > commercial use, paid if > xxx queries). maybe debian should look > through and remove ALL 'dual licensed' software, and when you install > SA from the RPM's, disable the dual licensed RBL's. You misunderstand Debian's role and license guidelines. Debian is a software distributor, and as such it is not silly for Debian to stop distributing software (ie. dcc) when distributing that software violates its rules. The blacklists enabled in SA by default are not software, they are simply hostnames that the Spamassassin software uses. Configured hostnames are not distribution restricted, and arguably not even 'software'. There is no software distribution restriction involved in having those blacklists enabled in SA that violates Debian's software distribution terms. The software that is distributed is Spamassassin, which has a fully compliant Debian software distribution license, not the blacklists that are enabled by default in Spamassassin. The blacklists do have a restricted use license, but that is something else altogether. The software 'dcc', is software, and with it carries a license which restricts its distribution, and thus Debian, as a software distributor, has to make decisions based on its own policy, if it is willing to accept such a distribution restriction. Debian has the DFSG, which is its guidelines for what is acceptable for distribution, and the license that the software 'dcc' carries does not satisfy those criteria. > Or, hey, lets pretend the people installing debian are smart enough to > be able to make up their own mind if they fit the free license model. People are free to do that, Debian wont distribute it for those people, but people are free to put whatever they like on their systems. > it IS a good service, and SA 3.3x supports the reputation query > directly now in the commercial license. > Some things to understand, (normal language vs legal talk) I believe it is a good service. If I could get updated software, with security upgrades, from Debian, I would use it. micah
Re: dcc: [26896] terminated: exit 241
Michael Scheidell writes: > On 4/21/10 1:25 PM, Ted Mittelstaedt wrote: >> >> >> Distributed Checksum Clearinghouse quite obviously feels that they have >> captured enough fishes in the ocean and are making plenty of money now >> and so do not require all of the free advertising that inclusion of >> their source in Debian gives them. Quite obviously they complained >> and >> their stuff was withdrawn as a result. > > The DCC author would welcome Debian replacing the old, broken code > with something new. That will only be accepted by Debian if the license were changed to be DFSG compliant[0], at which point it would be gladly re-introduced into Debian. I would even be happy to facilitate that process as a Debian Developer. > Or is it your debian folks just forgot to update it? My previous message detailed why it wasn't updated[1], a message that you replied to, more than once. Debian did not 'just forget to update it', rather it seems that you were the one who forgot something (the reason why it was not updated). In fact the whole thread here has continued on as a result of that very reason why Debian did not update it. I'll cite it again for you[2] "The Distributed Checksum Clearinghouse source carries a license that is free to organizations that do not sell filtering devices or services except to their own users and that participate in the global DCC network." This specifically violates DFSG #6. Its also worth noting here that the original Debian maintainer expressed frustration about the communication with upstream because, "he seemed to blacklist several ip ranges, including master's main mail server and murphy's [ed. note: these are Debian's mail servers] ip-range as well as the ip-range i ussualy [sic] used for mailing. So neither mailing him directly nor mailing to the mailing list was possible." [editor notes mine] > As was previously posted (by someone else) DCC is free for most > everyone, including ISP's who use it in their mail servers to protect > their own clients. There is free as in money, and then there is free as in freedom (libre), these are different things. > So, put your money where your mouth is. So the money is there, now what? > Why won't debian fix their broken RPM? Probably because Debian doesn't use RPMs... sorry I couldn't resist. The real reason is the one cited here, and in previous messages. > someone official from debian want to chime in? Since I am a Debian Developer, I may count as 'official' here. micah 0. http://www.debian.org/social_contract#guidelines 1. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/128332 2. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=380542
Re: dcc: [26896] terminated: exit 241
Ted Mittelstaedt writes: > Actually it's not even that. The notion that Debian spent effort > detecting and removing DCC source is rather farfetched. Sorry, but you are pretty off here. Debian does this all the time. I'm an official Debian Developer and I have personally been involved in doing this a few times. > Because Linux distros are so large, many freely available > commercially-licensed apps - such as device drivers - some of which > also do not carry "your allowed to distribute this" licenses, get > "sucked up" into the distributions. Unless you can find an example, you are making a specious argument. Do you know the process to get software into Debian? > Some of this happens by users contributing them and not reading the > licensing closely enough, but quite a lot of it happens by commercial > companies deliberately inserting their stuff in the distros. First, 'users' do not contribute applications to Debian, that isn't how it works. Secondly, even if an official Debian Developer (who actually is the only person permitted to contribute things to the Debian archive) happens to do as you assert and not read the licensing, then the Debian FTP-masters, whose role it is to specifically determine if the Debian Developer did their due diligence in checking the license restrictions, would reject that package. I guess the fact that I had to explain this answers my previous question, you do not understand how software gets into Debian. I would advise you to educate yourself before making arguments that by their very nature demonstrate your misunderstanding, it weakens your argument. [snip] > It's also generally understood that if a commercial app seller > doesen't like it they have the right to complain and get an immediate > cessation of inclusion of their apps in a distro. That is why I > suspect happened > here. Sorry, but if a DFSG-licensed application is put in Debian, no commercial app seller has any right to "complain and get an immediate cessation of inclusion of their apps in a distro". It doesn't work that way. > Distributed Checksum Clearinghouse quite obviously feels that they have > captured enough fishes in the ocean and are making plenty of money now > and so do not require all of the free advertising that inclusion of > their source in Debian gives them. Quite obviously they complained > and > their stuff was withdrawn as a result. Your conclusions are amazing, but that does not make them any more right. micah
Bayes timeouts and database handle being DESTROY'd without explicit disconnect
Hello, I'm running a busy mail server. We've got a bayes database on its own server, with InnoDB tables. I'm seeing a number of these entries in my log files and am struggling to determine what could be causing them and how to fix them: Oct 19 07:02:10 spamd3 spamd[27474]: learn: exceeded time limit in pms learn Oct 17 06:30:12 spamd3 spamd[25651]: plugin: eval failed: bayes: (in learn) __alarm__ignore__(15190) Oct 17 06:30:42 spamd3 spamd[25598]: plugin: eval failed: bayes: (in learn) child processing timeout at /usr/sbin/spamd line 1283, line 185. I get quite a few of these: Oct 19 07:02:19 spamd3 spamd[18746]: Issuing rollback() for database handle being DESTROY'd without explicit disconnect() at /usr/share/perl5/Mail/SpamAssassin/Plugin/Bayes.pm line 1516, line 2. and a few of these, although not that many: Oct 17 12:02:29 spamd3 spamd[6367]: prepare_cached(SELECT max(runtime) from bayes_expire WHERE id = ?) statement handle DBI::st=HASH(0xadbb060)still Active at /usr/share/perl5/Mail/SpamAssassin/BayesStore/SQL.pm line 722 Oct 19 05:33:13 spamd3 spamd[1630]: bayes: db_seen corrupt: value='1287482415' for 5d6fb52248450ee7528848c3a78b5a0650a24...@sa_generated, ignored at /usr/share/perl5/Mail/SpamAssassin/Plugin/Bayes.pm line 397, line 112. thanks for any insights! micha pgpOWKtRHjXPz.pgp Description: PGP signature
sa-learn --force-expire taking hours
I was investigating this morning why a number of spam messages were coming through and found that they weren't scoring on bayes, because it was unavailable. The database connection was working fine, but I noticed that the nightly sa-learn --sync --force-expire had been running since 3am, which was 4 and a half hours ago: root 26302 0.0 0.0 2440 892 ?Ss 03:00 0:00 /bin/sh -c sa-learn --sync --force-expire >/dev/null 2>&1 root 26305 0.0 0.0 35492 2528 ?S03:00 0:04 /usr/bin/perl -T -w /usr/bin/sa-learn --sync --force-expire I connected to the database and did a 'show processlist\g' and found a number of really long running processes: | Id | User| Host| db| Command | Time | State | Info | 66652 | spamass | 127.0.0.1:55248 | bayes | Query | 355113 | Sending data | SELECT count(*) FROM bayes_token WHERE id = '5' AND ati | a bunch of NULL processes (what are these?): | 463898 | spamass | 127.0.0.1:41393 | bayes | Sleep | 10592 | | NULL and a handful of 'rollback' processes: | 474169 | spamass | 127.0.0.1:35973 | bayes | Query | 1078 | NULL | rollback Plus the various bayes processes that I expect, a sampling of which is below: | 474756 | spamass | 127.0.0.1:34141 | bayes | Query |472 | end | UPDATE bayes_token SET atime = '1288102083' WHERE id = '5' AND token IN ('???-6','??,'R???','Xt | | 475050 | spamass | 127.0.0.1:48442 | bayes | Query | 5 | Updating | UPDATE bayes_vars SET spam_count = spam_count + '1' WHERE id = '5'| | 475089 | spamass | 127.0.0.1:48669 | bayes | Query | 0 | statistics | SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime FROM bayes_token Any ideas what could be going on, or steps I could take to troubleshoot this? Thanks! micah -- pgpkF4tD1yEOu.pgp Description: PGP signature
Re: Bayes timeouts and database handle being DESTROY'd without explicit disconnect
Dominic Benson writes: > On 19 Oct 2010, at 17:05, Micah Anderson wrote: > >> >> Hello, >> >> I'm running a busy mail server. We've got a bayes database on its own >> server, with InnoDB tables. > > What is your total DB size / server RAM? Could you include a snapshot of the > output of top from the DB server? I would guess that your problem is > indexing/tuning or server capacity MySQL side rather than in SA, but without > more data it is just a guess. The databsae size is 2.74gig. $ free total used free sharedbuffers cached Mem: 805587668727401183136 0 5840325403916 -/+ buffers/cache: 8847927171084 Swap: 1959912 5694321390480 top - 07:26:39 up 10 days, 20:37, 1 user, load average: 9.24, 6.80, 6.15 Tasks: 24 total, 2 running, 22 sleeping, 0 stopped, 0 zombie Cpu(s): 83.3%us, 16.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.5%si, 0.0%st Mem: 8055876k total, 6890032k used, 1165844k free, 584364k buffers Swap: 1959912k total, 569432k used, 1390480k free, 5405264k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 10744 mysql 20 0 655m 110m 5500 S 190 1.4 9296:14 mysqld 10765 stunnel4 20 0 123m 109m 1416 S2 1.4 179:38.73 stunnel4 1 root 20 0 1984 636 548 S0 0.0 2:40.15 init 397 bind 20 0 82856 23m 2632 S0 0.3 0:46.72 named 1812 root 20 0 3120 1176 772 S0 0.0 0:15.04 syslog-ng 3551 messageb 20 0 2488 648 488 S0 0.0 0:00.00 dbus-daemon 3610 nobody20 0 6368 2668 888 S0 0.0 0:11.94 nagios-statd 4828 root 20 0 5484 1824 1476 S0 0.0 0:09.44 master 10707 root 20 0 3784 1276 1076 S0 0.0 0:00.02 mysqld_safe 10745 root 20 0 2892 608 532 S0 0.0 0:00.00 logger 10760 stunnel4 20 0 3836 688 348 S0 0.0 1:25.14 stunnel4 10761 stunnel4 20 0 3836 692 352 S0 0.0 1:16.94 stunnel4 10762 stunnel4 20 0 3836 692 352 S0 0.0 1:16.24 stunnel4 10763 stunnel4 20 0 3836 692 352 S0 0.0 1:16.45 stunnel4 10764 stunnel4 20 0 3836 692 352 S0 0.0 1:20.77 stunnel4 11311 root 20 0 2044 888 704 S0 0.0 0:09.02 cron 15444 postfix 20 0 5496 1788 1452 S0 0.0 0:00.00 pickup I'm averaging around 150 mysql threads, with peaks during peak mail times. >> and a few of these, although not that many: >> >> Oct 17 12:02:29 spamd3 spamd[6367]: prepare_cached(SELECT max(runtime) from >> bayes_expire WHERE id = ?) statement handle DBI::st=HASH(0xadbb060)still >> Active at /usr/share/perl5/Mail/SpamAssassin/BayesStore/SQL.pm line 722 > > > Try an EXPLAIN SELECT max(runtime) from bayes_expire WHERE id = ; > as you know it to be slow it might give a clue where to look to improve > performance. Or try turning the general query log on for a while and see what > queries are taking up time. MonYog is quite a nice frontend to this, but you > can do it by hand fairly simply. mysql> EXPLAIN SELECT max(runtime) from bayes_expire WHERE id = 5; ++-+--+--+---+---+-+---+--+---+ | id | select_type | table| type | possible_keys | key | key_len | ref | rows | Extra | ++-+--+--+---+---+-+---+--+---+ | 1 | SIMPLE | bayes_expire | ref | bayes_expire_idx1 | bayes_expire_idx1 | 2 | const | 198 | | ++-+--+--+---+---+-+---+--+---+ 1 row in set (0.00 sec) Note, this might be related to the post I made today about sa-learn --expire taking hours... micah
update channel list
I've had the following channel list for a while: updates.spamassassin.org sought.rules.yerp.org khop-bl.sa.khopesh.com khop-blessed.sa.khopesh.com khop-general.sa.khopesh.com khop-sc-neighbors.sa.khopesh.com but I suspect that some of these are no longer good. I was hoping folks out there might be able to make some suggestions for improvements? thanks, micah -- pgpOebTBWqWzt.pgp Description: PGP signature
Re: update channel list
dar...@chaosreigns.com writes: > On 01/18, Micah Anderson wrote: >> updates.spamassassin.org >> sought.rules.yerp.org >> khop-bl.sa.khopesh.com >> khop-blessed.sa.khopesh.com >> khop-general.sa.khopesh.com >> khop-sc-neighbors.sa.khopesh.com >> >> but I suspect that some of these are no longer good. I was hoping folks >> out there might be able to make some suggestions for improvements? > > All of those are currently listed by Adam Katz on > http://khopesh.com/wiki/Anti-spam > I expect that list to be up to date. > He's an active spamassassin developer. > > That page also lists 90_2tld.cf.sare.sa-update.dostech.net. I doubt there > are any others worth using. If there are, they should probably get added > to http://wiki.apache.org/spamassassin/CustomRulesets > If there were more sa-update channels that were useful, I'd recommend > breaking that page up a little more to put the rule sets with update > channels at the top. > > If you're looking to improve SA accuracy in general, I've tried to make a > thorough checklist here: > http://wiki.apache.org/spamassassin/ImproveAccuracy Thanks, I'm going through that list to find anything that I dont have. I noticed that pyzor is recommended there. I had disabled it because it seemed like it was no longer being developed. I am trying to get it enabled, but I am running into the issue reported here: https://sourceforge.net/apps/trac/pyzor/ticket/163 I've requested a masscheck account, but am still waiting on that. I also noticed I didn't have these perl modules: Jan 19 11:05:06.710 [17267] dbg: diag: [...] module not installed: IP::Country::Fast ('require' failed) Jan 19 11:05:06.710 [17267] dbg: diag: [...] module not installed: IO::Socket::INET6 ('require' failed) The INET6 one probably isn't necessary because I dont have ipv6 yet. I couldn't find the IP::Country::Fast moduole as a debian package, although there is libgeo-ipfree-perl, its unclear if that can be used (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=374879)... Everything else on that page, except for your mention of 90_2tld.cf.sare.sa-update.dostech.net I've done. thanks, micah --
trusted networks getting marked as spam
Hi, I've got some machines that are running logcheck, they periodically send mail to us with reports. Sometimes those mails have some spammy stuff in them, because they are mail server logs, or web logs with some spammy stuff in them. I don't want spamassassin to deal with these messages, I want them to come through no matter what. I don't want them to contribute to bayes scoring and I don't want them ever to end up as Spam. Unfortunately, they are, it seems mostly because URIBL scores are hitting before the SHORTCIRCUIT/ALL_TRUSTED stuff fires, so for example: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) X-Spam-Flag: YES X-Spam-Status: Yes, score=8.1 required=6.0 tests=ALL_TRUSTED,SHORTCIRCUIT, URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_WS_SURBL shortcircuit=ham autolearn=disabled version=3.4.0 I've got the IP in trusted_networks, and internal_networks and I've got a couple shortcircuit rules like as follows: # simple, non-network-based whitelists, locally-generated messages, # messages via a trusted relay chain, simple meta SC_HAM (USER_IN_WHITELIST||USER_IN_DEF_WHITELIST||USER_IN_ALL_SPAM_TO||NO_RELAYS||ALL_TRUSTED) priority SC_HAM -1000 shortcircuit SC_HAM ham score SC_HAM -20 meta SC_SPAM (USER_IN_BLACKLIST_TO||USER_IN_BLACKLIST) priority SC_SPAM -950 shortcircuit SC_SPAM spam score SC_SPAM 20 shortcircuit ALL_TRUSTED on yet, the high scoring due to the URIBLs caused this to get classified as Spam. How can I get around that? Thanks! micah
Issuing rollback() due to DESTROY without explicit disconnect() of DBD::mysql::db handle bayes
Hi, I'm getting these errors in my log files, quite regularly: Sep 23 21:58:16 towhee spamd[25561]: Issuing rollback() due to DESTROY without explicit disconnect() of DBD::mysql::db handle bayes:0.0.0.0 at /usr/share/perl5/Mail/SpamAssassin/Plugin/Bayes.pm line 1590, line 2. It appears that bayes is working, because I see logs like this: Sep 23 22:02:19 towhee spamd[10768]: spamd: result: . -1 - AM_TRUNCATED,BAYES_00,CK_419SIZE,ENV_FROM_DIFF0,FORWARD_RELAY,HAS_REPLY_TO,HTML_MESSAGE,IP_REPEATING,MISSING_MID,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SUBJ_DATE scantime=0.7,size=11555,uid=65534,required_score=5.0,rhost=0.0.0.0,raddr=0.0.0.0,rport=37464,mid=(unknown),bayes=0.000147,autolearn=disabled,shortcircuit=no line 1590 is in the sub learner_new, but i have set in local.cf: local.cf:bayes_auto_learn 0 local.cf:bayes_learn_to_journal0 It seems like the database is working fine... any ideas? thanks! micah
MISSING_SUBJECT
I had a message marked with: 2.3 EMPTY_MESSAGE Message appears to have no textual parts and no Subject: It did not have a subject, but it did have content (although only encrypted) it also hit: * 1.8 MISSING_SUBJECT Missing Subject: header which makes sense, because the mail did not have one, but have you looked in your Spam folder lately? All spam has a subject, pretty much always an informal survey of my trash heap showed 4 messages out of 400 did not have a Subject, and two of them were repeats. -- micah
Re: MISSING_SUBJECT
Reindl Harald writes: > Am 13.06.2018 um 01:37 schrieb micah anderson: >> I had a message marked with: >> >> 2.3 EMPTY_MESSAGE Message appears to have no textual parts and no >> Subject: >> >> It did not have a subject, but it did have content (although only >> encrypted) it also hit: >> >> * 1.8 MISSING_SUBJECT Missing Subject: header >> >> which makes sense, because the mail did not have one, but have you >> looked in your Spam folder lately? All spam has a subject, pretty much >> always > > no - there is ton of junk without a subject and sometimes even floods > with no subject and no body at all I believe you, however the message was not empty, it had encrypted contents (and in fact was scored -1 because of that).
Re: MISSING_SUBJECT
Matus UHLAR - fantomas writes: > On 12.06.18 19:37, micah anderson wrote: >>2.3 EMPTY_MESSAGE Message appears to have no textual parts and no >>Subject: >> >>It did not have a subject, but it did have content (although only >>encrypted) it also hit: >> >>* 1.8 MISSING_SUBJECT Missing Subject: header >> >>which makes sense, because the mail did not have one, but have you >>looked in your Spam folder lately? All spam has a subject, pretty much >>always an informal survey of my trash heap showed 4 messages out of >>400 did not have a Subject, and two of them were repeats. > > and what is your point? The point is EMPTY_MESSAGE scores even though it did have content. But I guess the point is that it had no 'text' parts, because the content was only pgp/mime? -- micah
Re: MISSING_SUBJECT
John Hardin writes: > On Tue, 12 Jun 2018, micah anderson wrote: > >> I had a message marked with: >> >> 2.3 EMPTY_MESSAGE Message appears to have no textual parts and no >> Subject: >> >> It did not have a subject, but it did have content (although only >> encrypted) > > It may not be considering an encrypted message part to be a text body > part. What was the MIME type of that part? pgp/mime -- micah
Re: SA MySQL DB maintenance
"Kevin A. McGrail" writes: > I think Bayes should be in redis though not SQL. Curious to know why you think that?
Understanding ruleQA results
Hi, I'm trying to understand the ruleQA results because I'm trying to track down how common the rule FRNAME_IN_MSG_NO_SUBJ is spammy. I load the latest rules: http://ruleqa.spamassassin.org/20180813-r1837926-n/FRNAME_IN_MSG_NO_SUBJ/detail?s_corpus=1&s_g_over_time=1#overtime and I see the S/O value is 1.0, which is a rule that hits only on spam (a rule that only hits on ham is 0.0, a rule that doesn't anything is 0.5)... but how can I tell how many messages are part of the corpus? Also, the percentages seem very low: 1.5192% Spam, and .0005% Ham... 1.5% seems low to me to be adding 3.5 score to this rule, but what do I know... which is why I'm asking. thanks! -- micah
Re: Understanding ruleQA results
John Hardin writes: > On Tue, 14 Aug 2018, micah anderson wrote: > >> but how can I tell how many messages are part of the corpus? > > As RW said, hover over the percentages. Thanks. >> Also, the percentages seem very low: 1.5192% Spam, and .0005% >> Ham... 1.5% seems low to me to be adding 3.5 score to this rule, but >> what do I know... which is why I'm asking. > > It's not so much the raw amount of spam it hits, it's that it hits spam > that few other rules hit, or that it hits spam that other rules hit but > that doesn't score high enough with those other rules. > > You also want to look at the score-map section when evaluating a rule. Is there an explanation of the score-map section somewhere? For this one it says: scoremap ham: 0 33.33%1 * scoremap ham: 1 66.67%2 ** scoremap spam: 1 0.08% 15 scoremap spam: 3 0.61% 121 scoremap spam: 4 90.24% 17791 scoremap spam: 5 2.69% 531 * scoremap spam: 6 4.54% 896 * scoremap spam: 7 1.10% 217 scoremap spam: 8 0.26% 52 scoremap spam: 9 0.40% 79 scoremap spam: 10 0.01%2 scoremap spam: 11 0.05%9 scoremap spam: 14 0.01%2 What are these columns and how can I interpret it? > It's not so much the raw amount of spam it hits, it's that it hits spam > that few other rules hit, or that it hits spam that other rules hit but > that doesn't score high enough with those other rules. I searched my pile of mail that I have from two ice ages ago, and I did find 6 messages that were hits of this rule, one of them was spam, five of them were this person trying to contact me. > Do you happen to be seeing FPs with this rule? Yes, its why I am investigating it. I think it is common for people who are sending mail from their mobiles, where they use it more like a quick chat instead of a 'regular mail' In fact, this person used: X-Mailer: iPad Mail (15F79) -- micah
Re: Understanding ruleQA results
John Hardin writes: > On Tue, 14 Aug 2018, RW wrote: > >> On Tue, 14 Aug 2018 13:24:47 -0700 (PDT) >> John Hardin wrote: >> >>> On Tue, 14 Aug 2018, micah anderson wrote: >>> >> >>>> I searched my pile of mail that I have from two ice ages ago, and I >>>> did find 6 messages that were hits of this rule, one of them was >>>> spam, five of them were this person trying to contact me. >>> >>> ...without a subject? >>> >>>>> Do you happen to be seeing FPs with this rule? >>>> >>>> Yes, its why I am investigating it. I think it is common for people >>>> who are sending mail from their mobiles, where they use it more >>>> like a quick chat instead of a 'regular mail' >>>> >>>> In fact, this person used: >>>> X-Mailer: iPad Mail (15F79) >>> >>> OK, I can see about adding some mobile MUA exclusions. Any FP headers >>> you can provide (directly) will be helpful. Go ahead and sanitize the >>> recipient info, I don't think that would be relevant to tuning this >>> one. I'll provide some pastebin links in a separate email. >> I don't know that this is particularly specific to mobile, lots of >> people send emails with an empty subject. >> >> It sounds like the main cause would be a signature that contains the >> senders name as the only thing in a line. That'll be why all the >> FPs mentioned above came from the same person. Yes, this person has as their signature their name on one line, and their From: has that same name listed. > Question: were those messages scored as spam? yes, they were, will include the reports in the off-list email. -- micah
Re: Understanding ruleQA results
John Hardin writes: > On Tue, 14 Aug 2018, micah anderson wrote: > >> John Hardin writes: >> >>> On Tue, 14 Aug 2018, micah anderson wrote: > > OK, I can see about adding some mobile MUA exclusions. Any FP headers you > can provide (directly) will be helpful. Go ahead and sanitize the > recipient info, I don't think that would be relevant to tuning this one. I put 4 of the messages here: https://pastebin.com/YuPtBQXN thanks for your help! micah
Re: Current update channels
"Kevin A. McGrail" writes: > There are people asking me to put KAM.cf under the default sa-update > crypto signature. Technically, it's easy. But it would have to be > carefully considered as it's not a project ruleset. Thoughts on that? I would be interested in KAM as part of an update channel, it would make updates more frequent. The only thing is I have to adjust KAM each time I update it. For example, the political spam section is a bit dated and has caused some frustrations for people. -- micah
multiplying in rules
I was doing multiplication in rules to add scores, like this: meta LOCAL_EXCEEDED_PHISH (((0.4 * __MAILBOX) + (0.4 * __LOCAL_EXCEEDED) + (0.4 * __LOCAL_STORAGE) + (0.4 * __LOCAL_LIMIT)) > 1) but now when I run spamassassin --lint, I'm told things like this: Nov 20 09:34:42.096 [11146] warn: config: Strange rule token: 0.4 What should I do to fix that? Thanks! -- micah
Re: multiplying in rules
RW writes: > On Tue, 20 Nov 2018 12:38:24 -0500 > micah anderson wrote: > >> I was doing multiplication in rules to add scores, like this: >> >> meta LOCAL_EXCEEDED_PHISH (((0.4 * __MAILBOX) + (0.4 * >> __LOCAL_EXCEEDED) + (0.4 * __LOCAL_STORAGE) + (0.4 * __LOCAL_LIMIT)) >> > 1) >> >> but now when I run spamassassin --lint, I'm told things like this: >> >> Nov 20 09:34:42.096 [11146] warn: config: Strange rule token: 0.4 > > It's the decimal fractions. > >> What should I do to fix that? > > It should be fixed in the next release. ok, but until then, is the only option for me to disable these rules? These are particularly important rules for stopping phishing attacks, so I'd like to not disable them, but find some other kind of work around! -- micah
Re: multiplying in rules
RW writes: > On Tue, 20 Nov 2018 12:53:18 -0500 > micah anderson wrote: > >> RW writes: >> >> > On Tue, 20 Nov 2018 12:38:24 -0500 >> > micah anderson wrote: >> > >> >> I was doing multiplication in rules to add scores, like this: >> >> >> >> meta LOCAL_EXCEEDED_PHISH (((0.4 * __MAILBOX) + (0.4 * >> >> __LOCAL_EXCEEDED) + (0.4 * __LOCAL_STORAGE) + (0.4 * >> >> __LOCAL_LIMIT)) >> >> > 1) >> >> >> >> but now when I run spamassassin --lint, I'm told things like this: >> >> >> >> Nov 20 09:34:42.096 [11146] warn: config: Strange rule token: 0.4 >> > >> > It's the decimal fractions. >> > >> >> What should I do to fix that? >> > >> > It should be fixed in the next release. >> >> ok, but until then, is the only option for me to disable these rules? >> These are particularly important rules for stopping phishing attacks, >> so I'd like to not disable them, but find some other kind of work >> around! > > I don't believe it prevents the rule from working. It prevents sa-compile from running because spamassassin --lint fails. > What it does do is prevent compiled rules from being installed. But as I > said it's the decimal fractions that cause it to fail and the above > rule doesn't need to contain decimal fractions. How can I do it without the fractions? I've applied the patch from the repo to make it work. -- micah
Re: multiplying in rules
"Bill Cole" writes: > On 20 Nov 2018, at 13:53, John Hardin wrote: > >> On Tue, 20 Nov 2018, micah anderson wrote: > [...] >>>> What it does do is prevent compiled rules from being installed. But >>>> as I >>>> said it's the decimal fractions that cause it to fail and the above >>>> rule doesn't need to contain decimal fractions. >>> >>> How can I do it without the fractions? >> >> Multiply everything by 10:(__rulename * 4) ...etc... > 10 > > Or replace every decimal fraction with an integer division, so '0.4' > becomes '(4 / 10)' oh, of course. I was thinking that these amounts contributed to the score, but they do not. Thanks for wiping away the grime from my brain. -- micah
Re: Scoring by registrar?
Grant Taylor writes: >> A very large number (nearly all, in fact) of the spams I receive these >> days involve domains registered with Namecheap. I've received hundreds >> of spams involving .icu domains from what appear to be the same spammer. >> I also receive a large number of scams impersonating Bitmain, again >> using domains involving Namecheap. > > Is Namecheap just the registrar? Or are they also hosting the DNS service? As a Namecheap customer, you are making me want to move. That is good, but its also something you should consider, before you block the entire registrar: there are a significant number of non-spamming Namecheap customers that you would be cutting off if you did this. I understand you want to put pressure on Namecheap, but the flip side of that is you will be cutting yourself off from those domains in the process. >> While Namecheap does suspend at least some domains within days of their >> being used in a campaign, it's clear that these are being treated as >> single-use domains, so this has very little impact on the spammers. This sounds like Fast Flux - and it is not something that happens only on Namecheap. > I think there are also lists of domains that have been recently > registered. Which might help if the single use domains were recently > registered. Having such a list would be very helpful for dealing with fast flux. -- micah
Re: Scoring by registrar?
Sean Lynch writes: >>Having such a list would be very helpful for dealing with fast flux. > > SA already has this. It used fresh.fmb.la to detect domains registered within > the past couple of weeks. It does? Do I need to enable something to get that? -- micah
Re: Spamhaus Technology contributions to SpamAssassin
Giovanni Bechis writes: > On 7/3/19 7:11 PM, Riccardo Alfieri wrote: >> On 03/07/19 17:59, atat wrote: >> >>> You say in documentation: >>> >>> You should also drop, by default, all Office documents with macros. >>> >>> What plugin / method do You reccomend for that ? >> >> I'm no expert in detecting macros, but there at least two ways of doing that >> that comes to mind: >> >> - Clamav with the option OLE2BlockMacros Reading up on OLE2BlockMacros in clamav, I'm very confused by https://www.mail-archive.com/clamav-users@lists.clamav.net/msg42671.html Specifically: Setting 'OLE2BlockMacros Yes' effectively causes 'Heuristics.OLE2.ContainsMacros' to be returned, and disables all official and unofficial signatures. When 'OLE2BlockMacros Yes' this causes 'Heuristics.OLE2.ContainsMacros' to be returned first and all other signatures that are not against uncompressed macros are ignored. You only get one signature back and that is the first one hit, which may be a 'soft' signature ie one you mightn't discard an email on, such as Heuristics.OLE2.ContainsMacros, even though 'hard' signatures official or unofficial might also have hit if they had been run later . > This has been superseded by > https://svn.apache.org/repos/asf/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/OLEMacro.pm > the plugin is for trunk but it works out of the box in 3.4.3rc3 as well (some > work is needed to let it work on 3.4.2) Can't these be blocked at the MTA level to be much more CPU friendly? -- micah
Spoofed From: names
Hi, What is the current state of the art for dealing with tricking people in the From with the "Name" part? For example: From: "supp...@example.com" The "Real Name" part is used to put a fake email address of the actual domain (example.com would be my domain, or gmail.com or something other than air-compressor.ml). This has come up before[0], but at the time generic solutions seemed problematic due to various false positives, or missing features in spamassassin itself. I'm wondering what the current state is now. I can do a relatively easy meta-rule for my domain, something like this, but I'm not sure how well this would work, or if there are better methods now: header __LOCAL_FROM_QUOTE_ISUS From =~ /\".*\@example\.com\"/ header __LOCAL_FROM_CONTAIN_NOTUS From !~ /<.*\@example\.com/>/ meta TRICKY_FROM((( __LOCAL_FROM_QUOTA_ISUS ) + ( __LOCAL_FROM_CONTAIN_NOTUS )) > 1) describe TRICKY_FROMFrom has example.com in quotes, but not in path score TRICKY_FROM 5 0. https://www.mail-archive.com/users@spamassassin.apache.org/msg100800.html -- micah