from:"micah anderson"

config: failed to parse line

2007-09-21 Thread Micah Anderson


Occasionally I am seeing the following log lines, they don't seem to be
fatal, but I'd like to know what they are so I can decide if I need to
fix something:

Sep 21 07:24:07 spamd2 spamd[7749]: config: failed to parse line, skipping, in 
"(no file)": x-train 
Sep 21 07:24:07 spamd2 spamd[7749]: config: failed to parse line, skipping, in 
"(no file)": x-days 7 

I can't find these config variables set in /etc/spamassassin/*

This line also come along at the same time:

Sep 21 07:24:07 spamd2 spamd[7749]: config: SpamAssassin failed to parse line, 
no value provided for "use_bayes", skipping: use_bayes 

An odd line because my bayes is working, autolearning and classifying
fine and my 'use_bayes' line has a '1' after it:

local.cf:use_bayes 1
local.cf:bayes_auto_learn 1
local.cf:bayes_ignore_header Message-Id
local.cf:bayes_ignore_header Delivered-To
local.cf:bayes_ignore_header User-Agent
local.cf:bayes_ignore_header In-Reply-To
local.cf:bayes_ignore_header ReSent-Date
local.cf:bayes_ignore_header ReSent-From
local.cf:bayes_ignore_header ReSent-Message-ID
local.cf:bayes_ignore_header ReSent-Subject
local.cf:bayes_ignore_header ReSent-To
local.cf:bayes_ignore_header Resent-Date
local.cf:bayes_ignore_header Resent-From
local.cf:bayes_ignore_header Resent-Message-ID
local.cf:bayes_ignore_header Resent-Subject
local.cf:bayes_ignore_header Resent-To
local.cf:bayes_ignore_header X-Bogosity
local.cf:bayes_ignore_header X-CRM114
local.cf:bayes_ignore_header X-Enigmail-Version
local.cf:bayes_ignore_header X-Mailer
local.cf:bayes_ignore_header X-MailScanner
local.cf:bayes_ignore_header X-MailScanner-Information
local.cf:bayes_ignore_header X-MailScanner-SpamCheck
local.cf:bayes_ignore_header X-Mozilla-Status
local.cf:bayes_ignore_header X-Mozilla-Status2
local.cf:bayes_ignore_header X-no-archive
local.cf:bayes_ignore_header X-Original-To
local.cf:bayes_ignore_header X-PerlMX-Spam
local.cf:bayes_ignore_header X-Received-From-IP
local.cf:bayes_ignore_header X-Sanitizer
local.cf:bayes_ignore_header X-SA-Exim
local.cf:bayes_ignore_header X-Scanned-By
local.cf:bayes_ignore_header X-Sender
local.cf:bayes_ignore_header X-Sequence
local.cf:bayes_ignore_header X-Spam-Flags
local.cf:bayes_ignore_header X-Spam-Level
local.cf:bayes_ignore_header X-Spam-Score
local.cf:bayes_ignore_header X-Spam-Status
local.cf:bayes_ignore_header X-s.logic-spamassas-bar
local.cf:bayes_ignore_header X-s.logic-spamassas
local.cf:bayes_ignore_header X-Virus-Scanned
local.cf:bayes_ignore_header X-Virus-Status
local.cf:bayes_ignore_header X-Warning
local.cf:bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
local.cf:bayes_sql_dsn  DBI:mysql:bayes:dbw-pn
local.cf:bayes_sql_username spamass
local.cf:bayes_sql_password assmanspam
local.cf:bayes_sql_override_username   @GLOBAL
local.cf:bayes_expiry_max_db_size   100
local.cf:bayes_learn_to_journal0

Thanks,
micah

Bayes innodb problems

2007-09-26 Thread Micah Anderson


I was having problems with scalability with my bayes DB, so I read up on
the mailing list and found that it was recommended to switch to the
innodb storage engine because of the row-level locking (versus the
table-level locking that comes with MyISAM). Sounds great. So I
switched, and everything was fine for several days.

Then today the load on the DB server shot up to 11-13 and spam
processing has ground down to really slow. I'm seeing some incredibly
long queries now in my slow-query log, such as:

# Time: 070926 17:10:53
# [EMAIL PROTECTED]: spamass[spamass] @  [10.0.2.4]
# Query_time: 758  Lock_time: 0  Rows_sent: 1  Rows_examined: 2205327
SELECT count(*)
   FROM bayes_token
  WHERE id = '4'
AND ('1190846660' - atime) > '345600';

This seems really wrong

Then queries such as the following taking at least 30 seconds:

# Time: 070926 17:13:24
# [EMAIL PROTECTED]: spamass[spamass] @  [10.0.2.4]
# Query_time: 30  Lock_time: 0  Rows_sent: 88  Rows_examined: 88
SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime
 FROM bayes_token
WHERE id = '4'
  AND token IN
('  ')

I'm seeing in my spamd logs the following:
Sep 26 17:17:52 spamd2 spamd[5479]: bayes: expire_old_tokens: child processing 
timeout at /usr/sbin/spamd line 1246. 
Sep 26 17:17:52 spamd2 spamd[1160]: prefork: child states: 
BB 
Sep 26 17:17:52 spamd2 spamd[1160]: prefork: server reached --max-children 
setting, consider raising it 

I've got my --max-children set to 50, and I'm hitting this because the
DB is not responding fast enough.

Did I hit some sort of tipping point with the tokens in my database, do
I have too many or ... what is going on here? I have to turn off bayes
because its too slow and this is sad because this adds a lot to the
results. This is what I have configured:

bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn  DBI:mysql:bayes:dbw-pn
bayes_sql_username spamassassin
bayes_sql_password notthepasswd
bayes_sql_override_username@GLOBAL

# keep the database from getting too big:
bayes_expiry_max_db_size   100

# no affect
bayes_learn_to_journal 0

mysql settings related to innodb:

# * InnoDB
innodb_data_file_path = ibdata1:10M:autoextend
#
# Set buffer pool size to 50-80% of your computer's memory
set-variable = innodb_buffer_pool_size=1250M
set-variable = innodb_additional_mem_pool_size=20M
#
# Set the log file size to about 25% of the buffer pool size
set-variable = innodb_log_file_size=313M
set-variable = innodb_log_buffer_size=8M
#
innodb_flush_log_at_trx_commit=1

I'm using spamassassin 3.2.3 and mysql 5.0.45. 

Thanks,
Micah

Re: Bayes innodb problems

2007-09-28 Thread Micah Anderson

* Alex Woick <[EMAIL PROTECTED]> [070927 02:14]:
> Micah Anderson schrieb am 27.09.2007 02:20:
>
>> processing has ground down to really slow. I'm seeing some incredibly
>> long queries now in my slow-query log, such as:
>
> Try an "optimize table " for each of the sa tables. You just 
> filled the database from scratch, so perhaps the counters/statistics do not 
> reflect the actual value distribution yet.

Actually this bayes DB has been around for a few months, and has been
built up over time. 

This does make me wonder what regular DB maintenance tasks should be
performed on the bayes DB. It sounds like some people let the code
auto-expire, while some run cron jobs to expire data? What are the
benefits of each? Should I be running an optimize table every so often?

>> # Time: 070926 17:10:53
>> # [EMAIL PROTECTED]: spamass[spamass] @  [10.0.2.4]
>> # Query_time: 758  Lock_time: 0  Rows_sent: 1  Rows_examined: 2205327
>> SELECT count(*)
>>FROM bayes_token
>>   WHERE id = '4'
>> AND ('1190846660' - atime) > '345600';
>
> More than 10 minutes for counting 2 mio rows is a bit long. You can try to 
> look what Mysql is doing all the time. Execute a "show full processlist" 
> from a mysql command line while the above query is running and look at the 
> "State" column. If a SA-initiated query is waiting for a lock and actually 
> doing nothing, you should see it there. You also see all the other queries 
> that are currently running at this point and may be hogging the database 
> server.

Since I've adjusted the SQL query to use the index, I haven't seen this
problem, so I can't look at the State column to see what is going on.
This DB server isn't doing anything else, for any other database, so
there was no possibility of other things hogging the resources on the
server. 

> The database design and query design of Spamassassin is ok, even the 
> appearently non-indexable term "('1190846660' - atime) > '345600'", since 
> Mysql would not use the index on an optimized term anyway. Try an EXPLAIN 
> of this statement - Mysql will always use only the first half for lookup (4 
> bytes) of the index, which covers only the id part.

That is if I am optimizing...

mysql> explain SELECT count(*) FROM bayes_token WHERE id = '4' AND
('1190846660' - atime) > '345600';
++-+-+--+--+--+-+---++--+
| id | select_type | table   | type | possible_keys| key
| key_len | ref   | rows   | Extra|
++-+-+--+--+--+-+---++--+
|  1 | SIMPLE  | bayes_token | ref  | PRIMARY,bayes_token_idx2 |
bayes_token_idx2 | 2   | const | 229946 | Using where; Using index | 
++-+-+--+--+--+-+---++--+

>> innodb_flush_log_at_trx_commit=1
>
> Use value 0 for more performance and a small sacrifice of safety. See the 
> comment in the default *.ini file:

Mine doesn't have a comment... but looking at
http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html does lead me
to want to change this since I dont care about transaction-level ACID 
compliance with the bayes database, if I have issues with that DB, I
can always restore the backup from the day before.

Micah

reaching incoming connections queued max, what happens?

2007-09-28 Thread Micah Anderson


I was interested to find out what would happen if spamd was totally
overloaded, so I set my --max-children=1 and --max-conn-per-child=1 and
then started hitting spamd with spamc and -t timeout values to see what 
happens. 
Essentially, each connection (simultaneously generated) took 1 second
longer than the previous and at the -t value the message was returned
unscanned. I believe this is what could be expected. 

According to the spamd man page:

 -m number , --max-children=number
 This option specifies the maximum number of children to spawn.

 Incoming connections can still occur if all of the children are busy, 
however those 
 connections will be queued waiting for a free child.  

 Please note that there is a OS specific maximum of connections that can be 
queued 
 (Try "perl -MSocket -e’print SOMAXCONN’" to find this maximum).

This leads me to wonder what would happen if I hit my SOMAXCONN with
incoming messages, would they not be queued up? The SOMAXCONN on my
linux box appears to be 128. So to test, I did the following on my spamd 
server, 
and then restarted spamd:

echo "5" > /proc/sys/net/core/somaxconn

I then issued 15 simultaneous connections with the -t value set to 15.
Each individual connection took one second longer than the previous, as
before, and those connections that took over the -t value were returned
unscanned, as before. What puzzles me however is the fact that all of
the connections acted just as before, when the SOMAXCONN was set to 128.
Since each connection is coming in at exactly the same time I would
expect that the first one would get accepted by spamd, 5 connections
would be queued up, and then the 6+ connections would not be queued up
and something would happen, but it doesn't, its the same as before...
odd.

These are the time results for each individual spamc connection:

1. real0m1.267s
2. real0m2.567s
3. real0m3.986s
4. real0m5.178s
5. real0m6.461s
6. real0m7.914s
7. real0m9.090s
8. real0m10.361s
9. real0m11.738s
10. real0m13.377s
11. real0m15.033s -- returned un-scanned
12. real0m15.026s -- returned un-scanned

Micah

Re: SSO's RHSBL

2007-10-08 Thread Micah Anderson

* Giampaolo Tomassoni <[EMAIL PROTECTED]> [071008 08:47]:
> > -Original Message-
> > From: ram [mailto:[EMAIL PROTECTED]
> > Sent: Monday, October 08, 2007 5:30 PM
> > 
> > On Mon, 2007-10-08 at 14:40 +0200, Giampaolo Tomassoni wrote:
> > > I'm getting this stuff from named in my log files during message
> > scanning.
> > >
> > >   Oct  8 14:36:40 ns2 named[6541]: unexpected RCODE (SERVFAIL)
> > > resolving '.xxx.blackhole.securitysage.com/A/IN': a.b.c.d#53
> > >   Oct  8 14:36:40 ns2 named[6541]: unexpected RCODE (SERVFAIL)
> > > resolving '.xxx.blackhole.securitysage.com/A/IN': a.b.c.d#53
> > >   Oct  8 14:36:40 ns2 named[6541]: unexpected RCODE (SERVFAIL)
> > > resolving '.xxx.blackhole.securitysage.com/A/IN': a.b.c.d#53
> > >
> > > Is there any problem with securitysage.com?
> > >
> > 
> > the rhsbl has been down for months now
> 
> Well, it may be, but I believe it is not more than a week I'm getting these
> log entries.

This is right, these error only started showing up last week in the
logcheck logs of a system that was still setup to use that rhsbl. 

Does anyone have a legitimate reference about it being closed down?

Micah

Re: Disabling URIDNSBL plugin

2007-10-19 Thread Micah Anderson

* Daryl C. W. O'Shea <[EMAIL PROTECTED]> [071019 14:59]:
> Justin Kim wrote:
>> I don't know what is causing my postfix server to defer messages couple of
>> times daily.
>
>> By looking at the logs, I can only tell there is something that keeps one
>> spam checking process running for 5~10 mins.
>
> Likely bayes auto expiry.  Disable bayes_auto_expiry and do the expiries 
> via a cron job instead.

Do you think running a bayes expire via cronjob is necessary if you are
running a INNOdb based bayes DB (with this patch[1])?

Also, if you postpone the bayes expire to instead run it via cron aren't
you just making the expiration stack up and instead are delaying this
condition until later (when the cron job runs) and for longer (because
the expiration hasn't been run in a while)?

Micah

1. http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5661

Re: spamd hangs at 100% cpu

2007-10-19 Thread Micah Anderson


I too have experienced strange hangs with spamc/spamd combos on my
postfix box running maildrop/mailfilter. At first I was convinced it 
was my bayes DB because it was using MyISAM tables and these are slow 
and I'm doing a lot of mail. So I switched to InnoDB and then I was 
convinced that the problem had to do with table locking during SA 
auto-expire periods and as a result dug deep into the SA SQL and 
submitted a bug to enhance the query so it can use indexes[1]. 

Even after all this I was getting reports from people who received
bounced messages from my server saying that the default maildrop timeout
was reached (300 seconds) and as a result the message was considered as
the user being over quota and was bounced back to the original sender.
We run spamc with -t 100 and expected that this meant that after 100
seconds if the message wasn't returned from spamd, then we simply
accepted the message without any spam scanning. However, it seemed like
things were lasting far longer than 100 seconds (3x as long to hit the
maildrop timeout) and so our theory was that -t wasn't working properly.

Because of these incorrect bounces, this meant we were not delivering 
legitimate email, and so we turned off spamassassin and began digging 
deeper to try and determine what was causing this. 

I have spent hours devising and running tests to try and figure out what
is causing this, and so far I cannot replicate it in a test environment.

If you are interested in seeing my tests, and have any suggestions for
other tests that could be run to determine what might be causing this, I
am *very* interested. Please see my test page:

https://we.riseup.net/riseup+mail/spam-timeout-tests

Micah


1. http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5661

* Martin.Hepworth <[EMAIL PROTECTED]> [071019 02:03]:
> Peter
> 
> Get the latest ruleset for SA using sa-update, this works around an issue 
> with whois lookups.
> 
> Only run a few RBL's - you're running them all and this will take some time.
> 
> Running a local caching nameserver on the box will help as well.
> 
> 
> --
> Martin Hepworth
> Snr Systems Administrator
> Solid State Logic
> Tel: +44 (0)1865 842300
> 
> > -Original Message-
> > From: Peter Fastré [mailto:[EMAIL PROTECTED]
> > Sent: 19 October 2007 09:58
> > To: users@spamassassin.apache.org
> > Subject: spamd hangs at 100% cpu
> >
> > Hello
> >
> > I have a severe problem with one of my mailservers. I'm using spamassassin
> > 3.2.3 in combination with exim 4.66, and experience hanging spamd
> > processes which consume all my server resources.
> > I've searched these mailing lists, searched google, searched
> > documentation, ... I found very much old posts of people experiencing the
> > same problems, so I think it's a very common one. I tried different
> > solutions: tracing the process (process doesn't do anything when it hangs
> > - no trace output), clearing the bayes database (doesn't help), ...
> > The problem is really urgent, because exim receives timeouts from spamd,
> > and rejects the mails.
> > I reduced the number of mails each spamd processes, to reduce the risk of
> > hanging. Usually it hangs after having processed 2 or 3 mails.
> > Now I've even reduced it to 1, the hangups are less often, but still
> > there! I hope someone has a solution, or a clue to what I can do!
> > I checked the log files and debug output, which is very consistent. The
> > last thing all hanging processes do, is this:
> > Oct 19 09:42:09 mail01 spamd[6072]: rules: ran uri rule __DOS_HAS_ANY_URI
> > ==> got hit: "k"
> > After this line in the log, the process hangs.
> >
> > For your reference: the full log file is here:
> >  http://peter.lunatis.be/temp/spamd.txt
> >
> > Regards
> >
> > Peter
> >
> 
> 
> 
> 
> 
> **
> Confidentiality : This e-mail and any attachments are intended for the 
> addressee only and may be confidential. If they come to you in error 
> you must take no action based on them, nor must you copy or show them 
> to anyone. Please advise the sender by replying to this e-mail 
> immediately and then delete the original from your computer.
> Opinion : Any opinions expressed in this e-mail are entirely those of 
> the author and unless specifically stated to the contrary, are not 
> necessarily those of the author's employer.
> Security Warning : Internet e-mail is not necessarily a secure 
> communications medium and can be subject to data corruption. We advise 
> that you consider this fact when e-mailing us. 
> Viruses : We have taken steps to ensure that this e-mail and any 
> attachments are free from known viruses but in keeping with good 
> computing practice, you should ensure that they are virus free.
> 
> Red Lion 49 Ltd T/A Solid State Logic
> Registered as a limited company in England and Wales 
> (Company No:5362730)
> Registered Office: 25 Spring Hill Road, Begbroke, Oxford OX5 1RU, 
> United Kingdom
>

Re: Disabling URIDNSBL plugin

2007-10-20 Thread Micah Anderson

* mouss <[EMAIL PROTECTED]> [071020 09:38]:
> Micah Anderson wrote:
> > Do you think running a bayes expire via cronjob is necessary if you are
> > running a INNOdb based bayes DB (with this patch[1])?
> >
> > Also, if you postpone the bayes expire to instead run it via cron aren't
> > you just making the expiration stack up and instead are delaying this
> > condition until later (when the cron job runs) and for longer (because
> > the expiration hasn't been run in a while)?
> >
> >   
> 
> doing it once a day at 3 AM is not like doing it when delivering mail.
 
Unless you deliver mail 24 hours a day for people all over the world.
Then 3am in one place is noon in another.

Re: posting thru gmane to this list and not getting bombarded

2007-11-19 Thread Micah Anderson

* [EMAIL PROTECTED] <[EMAIL PROTECTED]> [071119 10:01]:
> N> PS: I post to this list using gmane. Is it possible to stop delivery
> N> on my email address so that I can post but I do not receive the list
> N> messages?
> 
> http://www.google.com/[EMAIL PROTECTED]

Can this information be added to
http://wiki.apache.org/spamassassin/MailingLists ?

Micah


signature.asc
Description: Digital signature

Re: posting thru gmane to this list and not getting bombarded

2007-11-20 Thread Micah Anderson

* Justin Mason <[EMAIL PROTECTED]> [071119 14:13]:
> 
> Micah Anderson writes:
> > * [EMAIL PROTECTED] <[EMAIL PROTECTED]> [071119 10:01]:
> > > N> PS: I post to this list using gmane. Is it possible to stop delivery
> > > N> on my email address so that I can post but I do not receive the list
> > > N> messages?
> > > 
> > > http://www.google.com/[EMAIL PROTECTED]
> > 
> > Can this information be added to
> > http://wiki.apache.org/spamassassin/MailingLists ?
> 
> go for it!  it's a wiki ;)

I'd like to, but I haven't been able to get
'[EMAIL PROTECTED]' to work, so it seems wrong to add
that if its not functioning.

micah

Re: posting thru gmane to this list and not getting bombarded

2007-11-21 Thread Micah Anderson

* François Rousseau <[EMAIL PROTECTED]> [071121 10:21]:
> Maybe iI'm weird but... what is the point to posting to a mailing list
> if you don't read it?

You *do* read it, you just read it via GMANE, instead of via a mail
reader. Some of us don't like to have our inboxes bombarded with mailing
lists, or prefer not to filter mailing lists to specific mailboxes but
instead isolate mailing list reading to a more comfortable medium which
allows us the ability to reply occasionally.

> > The second search result is the relevant one:
> >
> > You can do the equivalent (to turn off delivery) by un-subscribing
> > from the user's list and subscribing to
> > [EMAIL PROTECTED] .

You need to send an email to
[EMAIL PROTECTED] to get on this list.

micah


signature.asc
Description: Digital signature

Low scores

2008-02-23 Thread Micah Anderson


I feel like a lot of pretty obvious spams are getting through my system 
with appallingly low scores. I'm starting to wonder if something may be 
wrong with my setup. Looking at what spam tests did fire, I'm frequently 
surprised that more rules didn't fire (obvious lotto scams and nigerian 
inheritance scams seem to slip right by) and that the score are 
surprisingly low... I'd expect satisfyingly high scores for some of 
these, but I'm not seeing them.

I'm looking for people to have a look over these spams and give me some 
ideas of some possible areas for improvement (either score adjustments, 
configuration tweaks, plugins that I should try, etc.). 

The spams can be pulled from here: http://micah.riseup.net/spams

Thanks for any ideas,
micah

Re: Low scores

2008-02-24 Thread Micah Anderson

On Sat, 23 Feb 2008 18:52:01 -0800, Loren Wilton wrote:

>> I'm looking for people to have a look over these spams and give me some
>> ideas of some possible areas for improvement (either score adjustments,
>> configuration tweaks, plugins that I should try, etc.).
>>
>> The spams can be pulled from here: http://micah.riseup.net/spams
> 
> It appears to me you have just posted the body text for these spams. 
> Much of the spam catching is done off of the header information, so
> knowing that would help.

Check again, I posted the entire raw maildir message, which includes the 
headers.
 
> Also, knowing which tests did and didn't hit on your system would give
> us an idea what you might be missing.

You can see which tests hit in the headers of these emails. 
 
> That said, do you use the SARE rules?  There are a number of rules there
> that help catch 419's.

Yes, I am using the openprotect channel.

micah

Re: Low scores

2008-02-24 Thread Micah Anderson

On Sun, 24 Feb 2008 02:15:24 +0100, Matthias Leisi wrote:

> Micah Anderson schrieb:
> 
> | [surprisingly low scores]
> | The spams can be pulled from here: http://micah.riseup.net/spams
> 
> Most (all?) of the samples are forwarded through some debian.org
> mechanism. In order for blacklists to take full effect, you should
> configure your trust path (trusted_networks etc) accordingly.

My trusted_networks is set to:

trusted_networks 202.12.162. 
trusted_networks 10.0.
trusted_networks 10.8.0.

The first is trusting everything in that IP space, which we control, the 
second is a private network, and the third is a private network. Am I 
specifying those incorrectly perhaps?

I'm also short-circuiting on trusted-relay chained messages, using the 
following:

meta SC_HAM (USER_IN_WHITELIST||USER_IN_DEF_WHITELIST||
USER_IN_ALL_SPAM_TO||NO_RELAYS||ALL_TRUSTED||USER_IN_BLACKLIST_TO||
USER_IN_BLACKLIST)
priority SC_HAM -1000
shortcircuit SC_HAM ham
score SC_HAM -20

But I log in the headers all short-circuit status, with the following 
(and you wont see short-circuiting in the examples i posted):

status  

add_header all Status "_YESNO_, score=_SCORE_ required=_REQD_ 
tests=_TESTS_ shortcircuit=_SCTYPE_ autolearn=_AUTOLEARN_ 
version=_VERSION_"

Do I have something misconfigured in my trust path? I do have a forward 
from a debian.org email address that occasionally sends me legit email 
(although it does seem like a lot of spam gets through there), but I dont 
believe I have that domain in a whitelist anywhere.

thanks
micah

Re: Low scores

2008-02-25 Thread Micah Anderson

* Michael Scheidell <[EMAIL PROTECTED]> [080223 13:46]:
> > I feel like a lot of pretty obvious spams are getting through my system
> > with appallingly low scores. I'm starting to wonder if something may be
> > wrong with my setup. Looking at what spam tests did fire, I'm frequently
> > surprised that more rules didn't fire (obvious lotto scams and nigerian
> > inheritance scams seem to slip right by) and that the score are
> > surprisingly low... I'd expect satisfyingly high scores for some of
> > these, but I'm not seeing them.
> 
> You using any SARES' rules? If you have the cpu cycles, try that.  Also make
> sure you have latest SpamAssassin and are also running sa-update.  If you
> use sa-compile, make sure you run it every time you update rules.

I'm running version 3.2.3-0.volatile1 on Debian etch (it supposedly
has a number of backported fixes from 3.2.4). I run sa-update every
night on two channels: saupdates.openprotect.com (which contains the
recommended rules in the SARE), and updates.spamassassin.org. If there
is an update, I run sa-compile and then restart spamassassin.

Micah

Segfaulting when using compiled rules

2008-02-28 Thread micah anderson


my spamd is segaulting when I start it up. I tried to strace the
process and all I could see was that it was opening this file and then
doing some memory mappings and then segfaulting:

open("/var/lib/spamassassin/compiled/3.002003/auto/Mail/SpamAssassin/CompiledRegexps/body_0/body_0.so",O_RDONLY)
 = 8

Sine this is a compiled rule...  I tried to remove everything under
/var/lib/spamassassin/compiled and then re-run sa-compile (after doing a 
sa-update), which succeeded fine, but
as soon as I started up spamassassinbut it still segfaults.

So I turned off rule complation now and it starts fine, but I'm wondering what 
I can do to fix this.

I'm running 3.2.3 from volatile, and am running these channels:

sa-update --gpgkey D1C035168C1EBC08464946DA258CDB3ABDE9DC10 --channel 
saupdates.openprotect.com --channel updates.spamassassin.org 

Thanks for any ideas,
Micah

Re: two databases

2009-06-05 Thread Micah Anderson

Michael Grant  writes:

> I did not realize one could store the bayes scores in sql.
>
> So I'd store the bayes scores on a third server and let both mxes use
> the same database.

I did this, but my bayes in mysql and pointed two different spamd
machines at it, but I had severe problems that I could not resolve. I
posted to the list[0] about the problems.

The basic problem was that as soon as I fired up the second server it
immediately starts blocking on the bayes work. Average scantimes go from
1-2 seconds up to 35+ and the max children get eaten up by blocking on
the bayes work to the point where its pointless because too many
processes are blocked. Disabling the bayes_sql stuff on one of the
machines dropped the scantimes back to their expected average of 1-2
seconds (but of course none of the BAYES tests will fire and
autolearning fails).

My mysql server is its own machine, it was local to the first spamd
(local LAN) and remote to the second (over the net). I eliminated any
hostname lookup problems, obviously couldn't eliminate network latency,
but that shouldn't have caused such a severe result. I'm running with
InnoDB tables, so I shouldn't have any row-level locking issues... in
any case I might have had some issues because my MySQL database needed
to be optimized, but I was not able to determine how and now I just run
one of the spamd's without bayes, which is not too bad because my bayes
database seems to be totally worthless at the moment. :P

micah

0. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/113673

Bayes learning trusted networks mailing list email

2009-06-05 Thread Micah Anderson


I get a significant amount of spam that comes through mailing lists that
I am legitimately subscribed to, either they are the administration
emails asking me if I want to approve the "email" or not, or they are
messages that make it through the list.

These messages are either hitting ALL_TRUSTED, because they come from
mailing lists on my networks, or are tagged with a clear
untrusted-relays list. In otherwords, I've got my trusted_networks setup
so that SA knows about networks that I trust to be sending legitimate
email (they are not spam originators), but obviously spam gets through,
but the spam comes from hops previous to these networks. If I understand
things properly, because I've got these setup in my trusted_networks,
then these previous hops will be checked in RBLs, so the spam is more
detectable. For example, the debian servers do send some spam to me, but
the Received: headers in the emails are correct, so if the server's
address is in trusted_networks, then SA will look up the address debian
got the email from in RBLs.  

What I am unsure of is if I am poisoning my bayes by reporting these
messages that make it through as spam. Should I be just deleting them?
The tokens that are legitimate that will end up as collateral damage are
going to be the list footers, the list administration messages, and
potentially other pieces.

I'm hoping I can identify why my bayes database is so bad (it thinks
everything is BAYES_00 now), and if this is why I will want to change my
training behavior.

thanks,
micah

FreeMail.bl installation instructions

2009-06-05 Thread Micah Anderson


The FreeMail.pm installation instructions are a little thin:

### Install:
#
# Please add loadplugin to init.pre (so it's loaded before cf files!):
#
# loadplugin Mail::SpamAssassin::Plugin::FreeMail FreeMail.pm

My understanding, and please correct me if I am wrong, is that you
actually need to do this:

# 1. Install FreeMail.pm in /etc/spamassassin
#
# 2. Add the following loadplugin to init.pre:
#
# loadplugin Mail::SpamAssassin::Plugin::FreeMail FreeMail.pm
#
# 2. Download http://sa.hege.li/FreeMail.cf to /etc/spamassassin
#
# 3. Download http://sa.hege.li/freemail_domains.cf to /etc/spamassassin

I knew about the FreeMail.cf because I've used SA plugins before, but I
had no idea about the domain list. Might be good to make these
instructions a little more explicit, so that others will also win.

Micah

Re: two databases

2009-06-05 Thread Micah Anderson

* Michael Grant  [2009-06-05 10:26-0400]:
> On Fri, Jun 5, 2009 at 16:08, Micah Anderson  wrote:
> > Michael Grant  writes:
> >
> >> I did not realize one could store the bayes scores in sql.
> >>
> >> So I'd store the bayes scores on a third server and let both mxes use
> >> the same database.
> >
> > I did this, but my bayes in mysql and pointed two different spamd
> > machines at it, but I had severe problems that I could not resolve. I
> > posted to the list[0] about the problems.
> >
> > The basic problem was that as soon as I fired up the second server it
> > immediately starts blocking on the bayes work. Average scantimes go from
> > 1-2 seconds up to 35+ and the max children get eaten up by blocking on
> > the bayes work to the point where its pointless because too many
> > processes are blocked. Disabling the bayes_sql stuff on one of the
> > machines dropped the scantimes back to their expected average of 1-2
> > seconds (but of course none of the BAYES tests will fire and
> > autolearning fails).
> >
> > My mysql server is its own machine, it was local to the first spamd
> > (local LAN) and remote to the second (over the net). I eliminated any
> > hostname lookup problems, obviously couldn't eliminate network latency,
> > but that shouldn't have caused such a severe result. I'm running with
> > InnoDB tables, so I shouldn't have any row-level locking issues... in
> > any case I might have had some issues because my MySQL database needed
> > to be optimized, but I was not able to determine how and now I just run
> > one of the spamd's without bayes, which is not too bad because my bayes
> > database seems to be totally worthless at the moment. :P
> >
> > micah
> >
> > 0. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/113673
> >
> >
> 
> Wow.  I did not get around to setting this up yet.  But on the MySQL
> front, did you try enabling the query cache by adding this to the
> mysql command line?
> 
> --maximum-query_cache_size=1M

I presume this setting is the same in my.cnf:
query_cache_limit   = 1048576

I dont recall all the things I tried, but it seems worth trying again,
this time with a fresh approach. 

> Also, a tool I used a lot to help debug this sort of issue was mytop.

I've never had too much luck with mytop, but I have found the
tuning-primer.sh to work well: http://www.day32.com/MySQL/

micah


signature.asc
Description: Digital signature

Compiling with tcc, cannot start: segfaults

2008-05-21 Thread Micah Anderson


I chased this around for a while and when I finally determined the
cause, I figured I should post something so that future searchers will
find it.

I have been happily running 3.2.3-0.volatile1 (Debian) for months. Today 
I woke up to a lot of Spam in my INBOX, and spamassassin down. It seems
to have died during the cron sa-update process, so I try to start it up
again and I'm unable to start spamd, it segfaults when I do:

Starting SpamAssassin Mail Filter Daemon:
/etc/init.d/spamassassin: line 38: 11186 Segmentation fault  
start-stop-daemon --start
--pidfile $PIDFILE --exec $XNAME $NICE --oknodo --startas $DAEMON -- $OPTIONS 
$DOPTIONS

Those options come from the Debian initscript, if I unpack them and run
it manually:

# /usr/sbin/spamd OPTIONS="-i -u nobody -A
10.0.1.13,10.0.1.15,10.0.1.17,10.0.1.31,10.0.1.33,10.0.1.44 -q -x
--max-children 50 --helper-home-dir /etc/spamassassin"
Segmentation fault

Even without all the options:
# /usr/sbin/spamd
Segmentation fault

In fact, if I try to sa-compile, I get a segfault, if I purge the
3.002003 rules (and their compiled versions), re-run sa-update and then
sa-compile and then try to start spamassassin again, it segfaults

If I strace the process, the end is as follows:

stat64("/var/lib/spamassassin/compiled/3.002003/Mail/SpamAssassin/CompiledRegexps/body_0.pmc",
0xbfa315ac) = -1 ENOENT (No such file or directory)
stat64("/var/lib/spamassassin/compiled/3.002003/Mail/SpamAssassin/CompiledRegexps/body_0.pm",
{st_mode=S_IFREG|0444, st_size=58745, ...}) = 0
open("/var/lib/spamassassin/compiled/3.002003/Mail/SpamAssassin/CompiledRegexps/body_0.pm",
O_RDONLY|O_LARGEFILE) = 7
ioctl(7, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbfa312c8) = -1 ENOTTY
(Inappropriate ioctl for device)
_llseek(7, 0, [0], SEEK_CUR)= 0
read(7, "\npackage Mail::SpamAssassin::Com"..., 4096) = 4096
read(7, "razine\\b/i#,\n  q#__DRUGS_DIET5# "..., 4096) = 4096
read(7, "SPUR-M\\b/i#,\n  q#FB_SSEX# => q#/"..., 4096) = 4096
read(7, "#,\n  q#__FRAUD_WNY# => q#/\\b(?:d"..., 4096) = 4096
read(7, "SOR# => q#/not a registered inve"..., 4096) = 4096
read(7, "a stud/i#,\n  q#SARE_BETTERORG# ="..., 4096) = 4096
read(7, "|05 E(?:ast|\\.)? 85th St|10 S\\. "..., 4096) = 4096
read(7, " Blvd Suite 200|491 North Federa"..., 4096) = 4096
read(7, "RE_EN_N_800_5_1# => q#/800\\W+5(?"..., 4096) = 4096
read(7, " a|an? honest|you being a|to any"..., 4096) = 4096
read(7, " matter|mutual understanding|rel"..., 4096) = 4096
read(7, "U_PART_CIA# => q#/(?![\\s\"\'-][0-9"..., 4096) = 4096
read(7, " F X|A B S Y|H L U N|F C Y I|A M"..., 4096) = 4096
read(7, "> q#/\\bbuy\\b.{1,30}\\br(?:[EMAIL PROTECTED]|a"..., 4096) = 4096
read(7, "{0,40}account .{0,40}record/i#,\n"..., 4096) = 1401
brk(0x9c48000)  = 0x9c48000
stat64("/var/lib/spamassassin/compiled/3.002003/auto/Mail/SpamAssassin/CompiledRegexps/body_0",
{st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
stat64("/var/lib/spamassassin/compiled/3.002003/auto/Mail/SpamAssassin/CompiledRegexps/body_0/body_0.so",
{st_mode=S_IFREG|0555, st_size=1015528, ...}) = 0
stat64("/var/lib/spamassassin/compiled/3.002003/auto/Mail/SpamAssassin/CompiledRegexps/body_0/body_0.bs",
{st_mode=S_IFREG|0444, st_size=0, ...}) = 0
open("/var/lib/spamassassin/compiled/3.002003/auto/Mail/SpamAssassin/CompiledRegexps/body_0/body_0.so",
O_RDONLY) = 8
read(8, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340\\\0"...,512) = 512
fstat64(8, {st_mode=S_IFREG|0555, st_size=1015528, ...}) = 0
mmap2(NULL, 1018080, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8,0) = 
0xb77a8000
mmap2(0xb789, 69632, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xe7) = 0xb789
mprotect(0xbfa31000, 4096,
PROT_READ|PROT_WRITE|PROT_EXEC|PROT_GROWSDOWN) = 0
close(8)= 0
mprotect(0xb77a8000, 950272, PROT_READ|PROT_WRITE) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Process 16329 detached

So what was the cause? It turned out, I was trying to be smart and save
disk space by installing the 'tcc' compiler on all of our spam
processing servers. 'tcc' is known as 'the tiny C compiler', its small,
fast and ANSI C compliant. Its somewhat experimental, and as such when I
replaced it with gcc, blew away my compiled rules and re-ran sa-compile,
things were able to start up again fine.

Micah

Problems with 3.2.5

2008-09-11 Thread Micah Anderson


I just upgraded to 3.2.5 and have encountered some regressions.

First, I'm getting tons of the following in my logs, literally metric tons:

Sep 11 17:11:28 spamd2 spamd[27357]: Use of uninitialized value in 
concatenation (.) or string at 
/usr/share/perl5/Mail/SpamAssassin/Plugin/Check.pm line 1028,  line 315.

In order to get it to stop, I had to disable the shortcircuit plugin in
v320.pre. I filled a partition with this line in a couple minutes flat.

I particularly value the savings I get from this plugin, so I would like
to know how I can re-enable it!

This problem is also present in 3.2.4, but not in 3.2.3, if that helps.

Additionally, I am getting the following:

Sep 11 20:25:41 spamd2 spamd[26599]: DNS query timeout for
gamma._domainkey.gmail.com
Sep 11 20:16:02 spamd2 spamd[21923]: Compilation failed in require at
/usr/lib/perl5/Net/DNS/RR/TXT.pm line 11,  line 78.
Sep 11 20:16:02 spamd2 spamd[21923]: BEGIN failed--compilation aborted
at /usr/lib/perl5/Net/DNS/RR/TXT.pm line 11,  line 78.

These are obviously related to domainkeys/dkim, but the perl errors are
ugly.

Thanks for everyone's work on SA, its really appreciated,
Micah

Phishing rules?

2008-10-30 Thread Micah Anderson


I keep getting hit by phishing attacks, and they aren't being stopped by
anything I've thrown up in front of them:

postfix is doing:
reject_rbl_client   b.barracudacentral.org,
reject_rbl_client   zen.spamhaus.org,
reject_rbl_client   list.dsbl.org,

I've got clamav pulling signatures updated once a day from sanesecurity
(phishing, spam, junk, rogue), SecuriteInfo (honeynet, vx,
securesiteinfo) and Malware Black List, MSRBL (images, spam).

I've got spamassassin 3.2.5 with URIBL plugin loaded (which I understand
pulls in the 25_uribl.cf automatically, right? Or do I need to configure
that? if its automatic, that pulls in SURBL phishing). I've got Botnet
setup, PDFinfo and postcards, i'm using DCC and a bayesdb, i've got the
hashcash, and SPF plugins loaded, imageinfo, pretty much everything I
can think ofbut for some reason phishing attempts keep getting
through.

Sadly, I do not have an example I can share at the moment, as I
typically delete them in a rage after training my bayes filter on
them. However, I am looking for any suggestions of other things I can
turn on... in particular, are there rules that people have created that
look for certain keywords where the body is asking for your
account/password information?

Thanks for any ideas,
micah

Re: Phishing rules?

2008-10-31 Thread Micah Anderson

* Kelson <[EMAIL PROTECTED]> [2008-10-30 17:29-0400]:
> Micah Anderson wrote:
>>  reject_rbl_client   list.dsbl.org,
>
> DSBL has shut down, and you should remove the query from your list.  It  
> won't help with the phishing, but it'll free up some network resources.  
> Info: http://dsbl.org/node/3

Thanks, I wasn't aware of that. I'm only using zen.spamhaus now, which
is a shame. I had to remove barracuda because I've received already 3
complaints about false-positives, thats a real shame, because it was
blocking about 3x as much as zen was.

>> I've got clamav pulling signatures updated once a day from sanesecurity
>> (phishing, spam, junk, rogue), SecuriteInfo (honeynet, vx,
>> securesiteinfo) and Malware Black List, MSRBL (images, spam).
>
> Odd, ClamAV + SaneSecurty does a really good job here at blocking phish  
> before they even get to SpamAssassin.  We call clamd through MIMEDefang,  
> then call SpamAssassin (also through MimeDefang) if a message passes.
>
> Have you verified that Clam is using the SaneSecurity signatures?  How  
> are you calling ClamAV?

Oh I'm certainly blocking phishing attempts via the SaneSecurity
signatures, probably 200+ in the last hour alone. However, the phishing
emails that are getting through are not known to their signature
database, and in some case have been directly targetted at the domain I
am managing. Thats why I am interested in rules that look for typical
phishing emails. These emails are usually quite similar in their
construction, so it seems like a good case for rules.

micah

Re: Phishing rules?

2008-10-31 Thread Micah Anderson

* Jeff Chan <[EMAIL PROTECTED]> [2008-10-31 02:36-0400]:
> On Thursday, October 30, 2008, 12:56:53 PM, Micah Anderson wrote:
> 
> > I keep getting hit by phishing attacks, and they aren't being stopped by
> > anything I've thrown up in front of them:
> 
> [...]
> > I've got spamassassin 3.2.5 with URIBL plugin loaded (which I understand
> > pulls in the 25_uribl.cf automatically, right? Or do I need to configure
> > that? if its automatic, that pulls in SURBL phishing).
> 
> Increase the score on:
> 
> URIBL_PH_SURBL
> 
> The current SpamAssassin rules scoring process gives it an
> artificially low score which is counterproductive IMO.  If you
> want to stop more phishing spams, consider increasing the score. 

Thanks, I will do so... however the phishing emails I am getting are
of two types:

. generalized phishes, which I would expect SURBL to be able to detect a
large percentage of
. targetted phishing to my domain where the phisher attempts to
impersonate the 'admins' and ask for usernames/passwords. These I dont
think will get hits on SURBL, because they are specific to my domain,
and these are actually the more damaging because users are more likely
to be fooled by something that is claiming to come from 'us'.

Micah

signature.asc
Description: Digital signature

Re: Phishing rules?

2008-11-01 Thread Micah Anderson

Randy <[EMAIL PROTECTED]> writes:

> Micah Anderson wrote:
>> Sadly, I do not have an example I can share at the moment, as I
>> typically delete them in a rage after training my bayes filter on
>> them. However, I am looking for any suggestions of other things I can
>> turn on... in particular, are there rules that people have created that
>> look for certain keywords where the body is asking for your
>> account/password information?
>>   
> Report these and maybe they will add something that catches them. If
> one wanted to, they can get any mail the want through your filters if
> they are good and don't use things that trigger the rules.

Report them where exactly?

Here is an example one I received recently, note the hideously low bayes
score on this one, caused it to autolearn as ham even, grr.

>From [EMAIL PROTECTED] Fri Oct 31 20:00:45 2008
Return-Path: <[EMAIL PROTECTED]>
X-OfflineIMAP-x792266711-4c6f63616c-494e424f58: 1225549253-0134941395044-v6.0.3
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on spamd2.riseup.net
X-Spam-Level: 
X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW
autolearn=ham version=3.2.5
Delivered-To: [EMAIL PROTECTED]
Received: from mx1.riseup.net (unknown [10.8.0.3])
by cormorant.riseup.net (Postfix) with ESMTP id 58BFA19581F7
for <[EMAIL PROTECTED]>; Fri, 31 Oct 2008 20:00:40 -0700 (PDT)
Received: from master.debian.org (master.debian.org [70.103.162.29])
by mx1.riseup.net (Postfix) with ESMTP id AA4465701D1
for <[EMAIL PROTECTED]>; Fri, 31 Oct 2008 20:00:39 -0700 (PDT)
Received: from cat.cybersurf.net ([209.197.145.185] helo=cat.cia.com)
by master.debian.org with esmtp (Exim 4.63)
(envelope-from <[EMAIL PROTECTED]>)
id 1Kw6j8-0003iT-Ix
for [EMAIL PROTECTED]; Sat, 01 Nov 2008 03:00:38 +
Received: from reef.cybersurf.com ([209.197.145.198])
by cat.cia.com with esmtp (Exim 4.50)
id 1Kw6iz-0002Li-Pg; Fri, 31 Oct 2008 21:00:29 -0600
Received: from apache by reef.cybersurf.com with local (Exim 4.44)
id 1Kw6j0-0006W5-UJ; Fri, 31 Oct 2008 20:00:30 -0700
Received: from 196-207-0-227.netcomng.com (196-207-0-227.netcomng.com 
[196.207.0.227]) 
by webmail.3web.com (IMP) with HTTP 
for <[EMAIL PROTECTED]>; Sat,  1 Nov 2008 14:00:30 +1100
Message-ID: <[EMAIL PROTECTED]>
Date: Sat,  1 Nov 2008 14:00:30 +1100
From: WEBMAIL Help Desk <[EMAIL PROTECTED]>
Reply-to: [EMAIL PROTECTED]
Subject: WEBMAIL Help Desk
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
User-Agent: Internet Messaging Program (IMP) 3.2.1
X-Originating-IP: 196.207.0.227
To: undisclosed-recipients:;
X-Virus-Scanned: ClamAV 0.94/8552/Fri Oct 31 18:14:36 2008 on mx1.riseup.net
X-Virus-Status: Clean
Status: RO
Content-Length: 1427
Lines: 38

Dear Webmail User,
This message was sent automatically by a program on Webmail which
periodically checks the size of inboxes, where new messages are
received.
The program is run weekly to ensure no one's inbox grows too large. If
your inbox becomes too large, you will be unable to receive new email.
Just before this message was sent, you had 18 Megabytes (MB) or more of
messages stored in your inbox on your Webmail. To help us re-set your
SPACE on our database prior to maintain your INBOX, you must reply to
this e-mail and enter your

Current User name ()
and Password(   ).

You will continue to receive this warning message periodically if your
inbox size continues to be between 18 and 20 MB. If your inbox size
grows to 20 MB, then a program on Bates Webmai
will move your oldest email to a
folder in your home directory to ensure that you will continue to be
able to receive incoming email. You will be notified by email that this
has taken place. If your inbox grows to 25 MB, you will be unable to
receive new email as it will be returned to the sender.
After you read a message, it is best to REPLY and SAVE it to another
folder.

Thank you for your cooperation.
WEBMAIL Help Desk

---
3webXS HiSpeed Dial-up...surf up to 5x faster than regular dial-up alone... 
just $14.90/mo...visit www.get3web.com for details

Re: Phishing rules?

2008-11-01 Thread Micah Anderson

Karsten Bräckelmann <[EMAIL PROTECTED]> writes:

> On Thu, 2008-10-30 at 15:56 -0400, Micah Anderson wrote:
>> I keep getting hit by phishing attacks, and they aren't being stopped by
>> anything I've thrown up in front of them:
>> 
>> postfix is doing:
>>  reject_rbl_client   b.barracudacentral.org,
>>  reject_rbl_client   zen.spamhaus.org,
>>  reject_rbl_client   list.dsbl.org,
>> 
>> I've got clamav pulling signatures updated once a day from sanesecurity
>> (phishing, spam, junk, rogue), SecuriteInfo (honeynet, vx,
>> securesiteinfo) and Malware Black List, MSRBL (images, spam).
>
> I'd increase this, at least for the SaneSecurity phish sigs. They are
> being updated much more frequently.

Thanks for the pointer. For some reason I thought I had read on the
SaneSecurity site that you shouldn't pull more than once a day, but now
after you mentioned it I went and read again and they ask you dont pull
more frequently than once an hour... so I've changed that cronjob, that
should help.

>> I've got spamassassin 3.2.5 with URIBL plugin loaded (which I understand
>> pulls in the 25_uribl.cf automatically, right? Or do I need to configure
>
> Yes, unless you disable network tests in general. Should be easy to
> answer yourself if they are working, just by grepping for the rule names
> defined in 25_uribl.cf.

Network tests aren't disabled, and yeah I am seeing those rules occur in
some of my headers of mail that I can search through, so I think that
they are working. I've increased my overall URIBL scoring to 2.5 from
the default.

>> Sadly, I do not have an example I can share at the moment, as I
>> typically delete them in a rage after training my bayes filter on
>> them. However, I am looking for any suggestions of other things I can
>> turn on... in particular, are there rules that people have created that
>> look for certain keywords where the body is asking for your
>> account/password information?
>
> So you've pretty much thrown everything at it you could find... ;)  And
> they are still slipping through? How many are we talking here? Compared
> to the total number of spam / phish?
>
> Also, how many are being caught? Strikes me as odd that you don't have a
> sample but yet sound like every single one is slipping by.

These are hard for me to answer as I am not doing any analysis of how
many are caught. In the last week, I've gotten four of them through, and
I've received reports from a number of users that they too have received
them.

I've just sent a sample to the list however. 

> I guess, I would start verifying that all the above actually is working.
> Most notably the SaneSecurity phish sigs. ClamAV should catch the lions
> share, by far, assuming it comes before SA in your chain.

Yeah, I'm using the clamav-milter, so those get rejected really early
on.

Thanks for the ideas,
Micah

Re: Phishing rules?

2008-11-01 Thread Micah Anderson

Joseph Brennan <[EMAIL PROTECTED]> writes:

> Micah Anderson <[EMAIL PROTECTED]> wrote:
>
>> I keep getting hit by phishing attacks, and they aren't being stopped by
>> anything I've thrown up in front of them:
>
> Do you mean attempts to get your users to send their passwords,
> or fake mail pretending to be from banks?

I mean attempts to get my users to send their passwords, are these not
called phishing?

micah

Re: Phishing rules?

2008-11-01 Thread Micah Anderson

Brent Clark <[EMAIL PROTECTED]> writes:

> Hiya
>
> See SA examples
>
> http://wiki.junkemailfilter.com/index.php/Spam_DNS_Lists
>
> Also add hostkarma.junkemailfilter.com to you DNSBL.

Thanks, I'll add this to my local.cf and see how it goes.

> Another thing I do find is useful is adding additional higher valued
> MX records.
>
> http://www.junkemailfilter.com/spam/support.html

I dont really like the idea of adding some other site's MX to my DNS, so
I think I'll pass on this one.

thanks for the suggestions!
micah

bayes SQL delays

2008-11-02 Thread Micah Anderson


I have spamd setup to use bayes in a mysql database, works fine. I've
turned off auto-expiry and instead run a cronjob to expire in the middle
of the night (removes about 40k tokens on a run). I've made the DB
innoDB so it can handle locking better. I've got mysql-based user prefs
coming from the same database server, and that works (not everyone wants
bayes). Autolearning is working, I chew through a lot of mail every day,
in general everything seems fine.

Except that my spamd server is overloaded, so I need a second one. So I
set up another spamd instance, with the exact same configurations as the
first, fire it up and it immediately starts blocking on the bayes
work. Average scantimes go from 1-2 seconds up to 35+ and the max
children get eaten up by blocking on the bayes work to the point where
its pointless because too many processes are blocked. If I disable the
bayes_sql stuff in my local.cf, scantimes drop back to their expected
average of 1-2 seconds, but of course none of the BAYES tests will fire
and autolearning fails. 

What gives?

Re: Phishing rules?

2008-11-02 Thread Micah Anderson

Joseph Brennan <[EMAIL PROTECTED]> writes:

>> Reply-to: [EMAIL PROTECTED]
>
>
> First pass:
>
> header LOCAL_REPLYTO_LIVE Reply-to =~ /[EMAIL PROTECTED]/
> score LOCAL_REPLYTO_LIVE8.0
>
> Maybe scoring 8.0 for one thing scares you, but I haven't seen this
> fp in a couple of months.

Is live.com a legitimate email sender? It looks microsoft related. If I
set it to 8, then any mail from that address is surely to get caught as
spam, which may not be the right thing depending on other potential
legitimate addresses sending from that domain.

Or perhaps nothing but spam comes from live.com? I dont know anything
about it.

micah

Re: Phishing rules?

2008-11-02 Thread Micah Anderson

SM <[EMAIL PROTECTED]> writes:

> At 07:56 01-11-2008, Micah Anderson wrote:
>>Here is an example one I received recently, note the hideously low bayes
>>score on this one, caused it to autolearn as ham even, grr.
>
> [snip]
>
>>X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW
>> autolearn=ham version=3.2.5
>
> The sender is whitelisted by www.dnswl.org.

Yeah, because this one was forwarded through debian.org, which is
legitimate. The spam originator was not debian.org, but debian.org is
the one in dnswl.org.

>>Received: from master.debian.org (master.debian.org [70.103.162.29])
>> by mx1.riseup.net (Postfix) with ESMTP id AA4465701D1
>> for <[EMAIL PROTECTED]>; Fri, 31 Oct 2008 20:00:39 -0700 (PDT)
>
> The mail is coming through debian.org.  Do you want to blacklist that host?

No, I do not.

Re: Phishing rules?

2008-11-02 Thread Micah Anderson

Karsten Bräckelmann <[EMAIL PROTECTED]> writes:

> On Sat, 2008-11-01 at 11:30 -0400, Micah Anderson wrote:
>> Joseph Brennan <[EMAIL PROTECTED]> writes:
>
>> > Do you mean attempts to get your users to send their passwords,
>> > or fake mail pretending to be from banks?
>> 
>> I mean attempts to get my users to send their passwords, are these not
>> called phishing?
>
> An important bit of information, missing from the OP. :)  Targeted
> attacks at your users, so the general phishing BLs don't really apply.
>
> Anyway, can't you educate your users, that
>
> (a) Any administrative email will be sent from an official, well known,
> internal address? That means *not* an arbitrary address. Yes, sorry,
> the obvious...
> (b) They will *never* ever be asked for a password by mail. Period.
> Again, obvious...

We've been telling our users this for years, but there is always someone
who doesn't listen, or forgets, or something. I dont know. I find it
absolutely incredible that anyone would fall for any of these, yet I am
the one who has to clean up the mess :P

> Then block internal / administrative From addresses coming from any
> external SMTP.

Yeah, thats done, they dont get by faking our From, but the body is
constructed in a way to mislead and impersonate our "staff" or whatever,
usually by threatening people that their account will be closed, unless
they reply.

> This is not a technical way to stopping these, but an educational
> approach to prevent the most dumb and gross social engineering. At least
> the second one actually should be well-known, and I've seen ISPs
> pointing it out frequently...

Thanks, but we've done all these, and continue to do them, they are
another plank in the various mechanisms that we must employ.

micah

Re: Phishing rules?

2008-11-09 Thread Micah Anderson

Sahil Tandon <[EMAIL PROTECTED]> writes:

> Joseph Brennan <[EMAIL PROTECTED]> wrote:
>
>>> We get some legitimate email from @live.com users.
>>
>> But they don't set a Reply-to header.  That's the test.
>
> But that wasn't his question; he asked whether any legitimate mail flows
> from live.com.  That was my answer. :)

You are technically correct, but Joseph's message made clear the
information that I was not aware of, which was quite helpful and
technically better.

Micah

Re: Checking for SPF & DKIM Checks

2008-11-09 Thread Micah Anderson

Byung-Hee HWANG <[EMAIL PROTECTED]> writes:

> mouss wrote:
> [...]
>> let's start with DKIM.
>> 
>> do you have
>> loadplugin Mail::SpamAssassin::Plugin::DKIM
>
> + i'm use with following rule ;;
> score DKIM_VERIFIED   -45.3

Even with the default DKIM scores, I finding I am getting spam that are
DKIM_VERIFIED causing the score to dip below zero and let the message
through, for example:

http://micah.riseup.net/1

I am thinking of actually increasing the score because of this.

micah

Re: Phishing rules?

2008-11-09 Thread Micah Anderson

Joseph Brennan <[EMAIL PROTECTED]> writes:

> /Dear .{0,12}(web ?mail|columbia\.edu)/i
>
> /Password.{0,10}\([\s\.\*\_]+\)/
>
> /you must reply to this email/i
>
> Reply-to =~ /[EMAIL PROTECTED]/

I created a meta-rule out of these (with a score of 8), and then ran
spamassassin -D < phish to see how it worked, it matched the metarule
flawlessly, but the phish ended up with only a 5.4 score due to BAYES_00
dragging it down. That was surprising to me, so I started to wonder if
my bayes DB was poisoned. 

I ran some stats, and the results seem to indicate a healthy bayes
database (unless I am reading this wrong)... A side note: its
interesting to note how only 9% of our email is spam, which seems low,
but maybe clamav-milter+rbls are blocking the remaining 40%?

Email:  2379392  Autolearn: 1075396  AvgScore:  -6.32  AvgScanTime:  5.96 sec
Spam:227816  Autolearn: 114079  AvgScore:  14.75  AvgScanTime:  4.23 sec
Ham:2151576  Autolearn: 961317  AvgScore:  -8.56  AvgScanTime:  6.15 sec

Time Spent Running SA:  3941.26 hours
Time Spent Processing Spam:  267.76 hours
Time Spent Processing Ham:  3673.50 hours

TOP SPAM RULES FIRED
--
RANKRULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM
--
   1HTML_MESSAGE154522   54.03   67.83   52.57
   2BAYES_991345316.09   59.050.48
   3BOTNET  1336878.90   58.683.63
   4RDNS_NONE   102255   10.19   44.886.51
   5URIBL_JP_SURBL  98879 4.94   43.400.87
   6MIME_HTML_ONLY  87518 7.62   38.424.36
   7URIBL_OB_SURBL  76624 3.98   33.630.84
   8DCC_CHECK   74600 8.51   32.755.94
   9URIBL_AB_SURBL  59890 2.72   26.290.23
  10URIBL_SC_SURBL  53911 2.51   23.660.27
  11RCVD_IN_BL_SPAMCOP_NET  43120 2.43   18.930.68
  12URIBL_WS_SURBL  38251 1.79   16.790.21
  13URIBL_RHS_DOB   36565 2.17   16.050.70
  14BAYES_5035322 3.93   15.502.71
  15HTML_IMAGE_ONLY_16  33887 1.68   14.870.28
  16HTML_SHORT_LINK_IMG_2   33118 1.56   14.540.19
  17HTML_IMAGE_RATIO_02 32757 2.93   14.381.72
  18URIBL_SBL   30456 1.80   13.370.57
  19RAZOR2_CHECK27722 2.55   12.171.53
  20RAZOR2_CF_RANGE_51_100  26856 2.41   11.791.41
--

TOP HAM RULES FIRED
--
RANKRULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM
--
   1BAYES_002002969  84.675.15   93.09
   2HTML_MESSAGE1131073  54.03   67.83   52.57
   3UNPARSEABLE_RELAY   760567   32.93   10.12   35.35
   4DKIM_SIGNED 693328   29.746.26   32.22
   5DKIM_VERIFIED   531590   22.673.38   24.71
   6ALL_TRUSTED 1736127.300.058.07
   7USER_IN_WHITELIST   1557046.540.007.24
   8RDNS_NONE   140127   10.19   44.886.51
   9DCC_CHECK   1278448.51   32.755.94
  10RCVD_IN_DNSWL_LOW   1018634.310.344.73
  11MIME_HTML_ONLY  93817 7.62   38.424.36
  12RCVD_IN_DNSWL_MED   90038 3.810.314.18
  13WHOIS_NETSOLPR  87575 3.720.384.07
  14MIME_QP_LONG_LINE   82804 4.49   10.523.85
  15BOTNET  78052 8.90   58.683.63
  16BAYES_5058286 3.93   15.502.71
  17FUZZY_AMBIEN53284 2.280.382.48
  18SARE_SUB_ENC_UTF8   50533 2.140.172.35
  19SARE_MILLIONSOF 42268 1.840.671.96
  20FORGED_YAHOO_RCVD   38762 1.741.161.80
--


Then I looked to see what bayes did with the message, but I do not
understand how to read the output, can someone explain this to me and
give me an idea why BAYES_00 fired when we've been feeding every one of
these spams to bayes to train on it?

$ spamassassin -D bayes < phish 
[9595] dbg: bayes: using username: @GLOBAL
[9595] dbg: bayes: database connection established
[9595] dbg: bayes: found ba

Re: Funds / Award release scams poor scoring

2008-11-10 Thread Micah Anderson

* Justin Mason <[EMAIL PROTECTED]> [2008-11-10 05:30-0500]:
> 
> John Hardin writes:
> > On Sun, 9 Nov 2008, Micah Anderson wrote:
> > > Does anyone have any rules to catch these, or suggestions of scores to
> > > tweak to make these hit better?  I am running clamav-milter with the
> > > sanesecurity add-ons, but these are still making it through.
> > 
> > Check out the sought-fraud ruleset.
> > 
> > http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/jm/20_sought_fraud.cf
> > 
> > (I don't know if it's in sa-update yet - Justin?)
> 
> I thought it was, but it seems I never made that part of the publishing
> process active ;)  I'll do that.

Does this mean it will show up in the regular updates.spamassassin.org
channel? Or is there another that I should follow?

Thanks!
micah


signature.asc
Description: Digital signature

Re: Funds / Award release scams poor scoring

2008-11-09 Thread Micah Anderson

John Hardin <[EMAIL PROTECTED]> writes:

> On Sun, 9 Nov 2008, Micah Anderson wrote:
>
>> Does anyone have any rules to catch these, or suggestions of scores to
>> tweak to make these hit better?  I am running clamav-milter with the
>> sanesecurity add-ons, but these are still making it through.
>
> Check out the sought-fraud ruleset.
>
> http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/jm/20_sought_fraud.cf

I am pulling the sought.rules.yerp.org channel, I thought that this was
the same, but diff'ing these shows a lot of differences.

> (I don't know if it's in sa-update yet - Justin?)

Would be nice if I could pull these in via sa-update!

micah

Re: Funds / Award release scams poor scoring

2008-11-09 Thread Micah Anderson

Chris <[EMAIL PROTECTED]> writes:

> On Sunday 09 November 2008 2:33 pm, Micah Anderson wrote:

>  2.5 CTYME_IXHASH   BODY: iXhash found @ ixhash.junkemailfilter.com

This one is interesting to me, when I pump these messages through spamc
-R I get:

-5.0 RCVD_IN_JMF_W  RBL: Sender listed in JMF-WHITE
   [70.103.162.29 listed in hostkarma.junkemailfilter.com]

Because I added the hostkarma.junkemailfilter RBLs, as described here:
http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/113625

Getting -5 on these kind of sucks, but yours doesn't look like a RBL
check, and is scoring it up. What test is that?

> Above are how these scored on my stand-alone box. You may want to run the 
> Freemail plugin, SA-Grey plugin. Are you running Razor? 

The rest of my tests were the same as yours, with the exception of the
Freemail and SA-Grey plugins, which I do not have. I'll track those
down. I am running razor, the first message gets a + .5 from
RAZOR2_CHECK, the 4th message gets 0.5 RAZOR2_CHECK + 1.5
RAZOR2_CF_RANGE_E4_51_100 + 0.5 RAZOR2_CF_RANGE_51_100

Micah

Funds / Award release scams poor scoring

2008-11-09 Thread Micah Anderson


I'm getting a number of these types of emails getting through SA with
either negative scores, or very low scores. This is surprising to me as
these are pretty classic spams. I suspect that some of the low scores
are due being DKIM signed. 

Does anyone have any rules to catch these, or suggestions of scores to
tweak to make these hit better?  I am running clamav-milter with the
sanesecurity add-ons, but these are still making it through.

I here are 5 different ones, all that got through in the last 24
hours:

http://micah.riseup.net/1
http://micah.riseup.net/2
http://micah.riseup.net/3
http://micah.riseup.net/4
http://micah.riseup.net/5

Thanks

Re: Phishing rules?

2008-11-09 Thread Micah Anderson

Joseph Brennan <[EMAIL PROTECTED]> writes:

> /Dear .{0,12}(web ?mail|columbia\.edu)/i
>
> /Password.{0,10}\([\s\.\*\_]+\)/
>
> /you must reply to this email/i
>
> Reply-to =~ /[EMAIL PROTECTED]/

I'm new at writing custom rules, so I am trying to figure out the best
way to do this. Would it be better to make a different rule for each one
of these, or would it be better to bmake a meta-rule? My guess is its
better to make a meta-rule, but that means that each rule must hit in
order to get the larger score, versus some of the individual rules
hitting and adding up to the larger score. The meta-rule seems good
because it describes a full profile phishing email that must be met, but
it seems bad because one tweak of the phish would result in the
meta-rule not matching overall. I suppose this is the point of the
arthemetic meta-rule possibility, however I'm puzzled at the best
mechanism to choose. Any advice would be appreciated.

Once I figure out the best way to match these, I need a good way to
determine what I should score these, the rule-writing documentation
suggests starting at 0.1 and then moving it up as you test it, and
suggests extreme caution scoring a custom rule over 1, however it seems
like these would be better scored higher than that.

> The first of course is partly local to us.  Another useful local rule
> is to check for the uri of your own webmail.

Yeah, i'll make a uri rule for that and probably add that to the
meta-rule.

Thanks for any advice,
micah

Hard money conference spam

2008-11-11 Thread Micah Anderson


I'm getting probably 4-5 of these a day, the messages vary, so they
aren't the same, but they aren't firing on any specific rules related to
their 'hard money conference/webinar/seminar' etc. Does anyone have any
customized rules for these? I've been training my bayes on them, and its
starting to pick them up (at BAYES_40 now), but it could use some more
specific rules:


Content analysis details:   (5.1 points, 8.0 required)

 pts rule name  description
 -- --
 0.0 FH_XMAIL_RND_833   Special X-Mailer Version
-0.2 BAYES_40   BODY: Bayesian spam probability is 20 to 40%
[score: 0.2305]
 2.2 DCC_CHECK  Listed in DCC (http://rhyolite.com/anti-spam/dcc/)
 1.0 RCVD_IN_BRBL   RBL: Received via relay listed in Barracuda RBL
[66.29.0.197 listed in b.barracudacentral.org]
 1.0 RCVD_IN_JMF_BR RBL: Sender listed in JMF-BROWN
 [66.29.0.197 listed in hostkarma.junkemailfilter.com]
 1.1 URIBL_RHS_DOB  Contains an URI of a new domain (Day Old Bread)
[URIs: hardmoney-event.com]

Return-Path: <[EMAIL PROTECTED]>
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on spamd1.riseup.net
X-Spam-Level: ***
X-Spam-Status: No, score=3.9 required=5.0 tests=FH_XMAIL_RND_833,
RCVD_IN_JMF_BR,URIBL_BLACK,URIBL_RHS_DOB autolearn=no version=3.2.5
Delivered-To: [EMAIL PROTECTED]
Received: from mx1.riseup.net (egret-vpn.riseup.net [10.8.0.3])
by cormorant.riseup.net (Postfix) with ESMTP id 602201C38CA8
for <[EMAIL PROTECTED]>; Mon, 10 Nov 2008 23:23:26 -0800 (PST)
Received: from ip197.rutcommercial.com (ip197.rutcommercial.com [66.29.0.197])
by mx1.riseup.net (Postfix) with SMTP id 10F4757002B
for <[EMAIL PROTECTED]>; Mon, 10 Nov 2008 23:23:10 -0800 (PST)
Date: Tue, 11 Nov 2008 02:10:03 -0500
From: "Larry Rivera" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: thursday's hard money 
MIME-Version: 1.0
X-Mailer: oer v8.3.3.1000.10001079
Reply-To: [EMAIL PROTECTED]
Message-Id: <[EMAIL PROTECTED]>
Content-Type: text/plain;
charset="iso-8859-1"
X-Virus-Scanned: ClamAV 0.94/8607/Mon Nov 10 21:55:28 2008 on mx1.riseup.net
X-Virus-Status: Clean
Content-Length: 528

Hard Money National Event takes place on November 13th.

follow the following steps to register:

1. Visit our website  http://hardmoney-event.com
2. click "attend a seminar" and register for the event.
3. We will confirm your registration the same day.
4. call us at 858-736-7788 for additional information.

If you wish to opt out of future messages, please go to
http://hardmoney-event.com/uns/ or, send us a letter to PBMSII, 5580 la jolla 
blvd #153 La Jolla, Ca 92037



.

Re: SURBL Usage Policy change

2008-11-11 Thread Micah Anderson

"Jeff Chan" <[EMAIL PROTECTED]> writes:

I think that SURBL is a valuable service, and I understand how it is
difficult to maintain such a service without resources.

> The funding is, by design, very moderate and will provide much needed
> support to sustain this initiative.

However, I believe that for non-profit organizations the funding model
is not moderate at all. Perhaps this is because of the unfortunate
decision to put non-profits into the same category as governments, which
typically are able to bring in much larger amounts of money. Or perhaps
it is a short-sighted view that non-profits all fall into the same
category of large, well-funded non-profits. While there are some that do
have resources available to them, a large majority of non-profits are
deeply struggling with resources and honestly I cannot imagine any being
able to afford the subscription rates that are listed for
non-profits/governments. I'm on the board of directors and am an
executive for three different non-profit organizations, and although
they all would be eager to contribute to SURBL, none of them could
possibly meet the funding bar that has been set.

The SURBL FQS is great, and it is appreciated that you have thought of
small charitable/non-profits with low email volume. However, I think you
are missing that there are small charitable/non-profits that can do this
volume on a extremely tight budget.

Micah

Re: Hard money conference spam

2008-11-11 Thread Micah Anderson

Rob McEwen <[EMAIL PROTECTED]> writes:

> Micah,
>
> In addition to the barracuda RBL, this IP is also listed on ivmSIP
> (since 10/21/08) and ivmSIP/24

Can you provide me with the local.cf details to be able to add the
ivm RBLs?

> Additionally, the domain "hardmoney-event DOT com" is blacklisted on
> both ivmURI and URIBL.COM
>
> At the very least, you should add uribl.com to your filtering since that
> list is free. Scoring with URIBL for this would have easily put that
> message "over the top" for you.

I understood URIBL to be enabled by default in SA, and updated via
sa-update, in fact I've got:

/var/lib/spamassassin/3.002005/updates_spamassassin_org/25_uribl.cf

> SHORT ANSWER: Start using uribl.com's URI blacklist

Am I not using it already? Maybe I'm not, and the 25_uribl.cf doesn't
include it? If so, I would really like to know about this.

Thanks!
Micah

Freemail config: dup unknown type freemail_re, Regexp

2008-11-11 Thread Micah Anderson


I recently added the FreeMail plugin, and although it appears to be
working, when I start SpamAssassin, I receive this message in my log:

Nov 11 06:45:48 spamd2 spamd[29934]: config: dup unknown type freemail_re, 
Regexp

I've put the FreeMail.pm in /etc/spamassassin, and created FreeMail.cf
as described, and it appears like it is working, as I am seeing some
messages get tagged with it. 

Are the plugins that I am installing like this compilable regexps with
sa-compile? Or do they stand separately?

Thanks,
micah

Re: Checking for SPF & DKIM Checks

2008-11-11 Thread Micah Anderson

mouss <[EMAIL PROTECTED]> writes:

> Francis Russell wrote:
>>  >> Even with the default DKIM scores, I finding I am getting spam that are
>>  >> DKIM_VERIFIED causing the score to dip below zero and let the message
>>  >> through, for example:
>>  >>
>>  >> http://micah.riseup.net/1
>>  >
>>  > that's spam relayed by a debian list. definitely a different beast...
>>
>> I interpret those headers as spam being sent to a Debian e-mail
>> address, then forwarded to a personal address.

That is a correct interpretation. I get most of my spam this way.

> That's what I meant. Maybe I use the term "relay" too "liberally"?
> anyway, such spam is harder to stop unless you add the list relays to
> your trusted_networks.

This is something in SA that I have the hardest time understanding, the
trusted_networks and internal_networks settings. I've read all the posts
that try to clarify it and I still can't keep it straight :) 

How would adding a list relay to my trusted_networks actually make
stopping spam easier? Doesn't that make it a network that I should spend
less time doing SA processing, because I 'trust' it?

micah

Re: Barracuda RBL

2008-11-11 Thread Micah Anderson

"Sujit Acharyya-Choudhury" <[EMAIL PROTECTED]> writes:

> Thanks Henrik.  However, I am not using SVN 3.3 so the rule on its own
> will be useful.

I'm using:

# Add a rule to give barracude RBL a +1 score, this is a really good
# RBL, but we were having false-positives when using it to block at
# the SMTP level, so using it in a weighted spamassassin rule is
# better because we can benefit from it without being strict
header RCVD_IN_BRBL eval:check_rbl('brbl-lastexternal', 
'b.barracudacentral.org.', '127.0.0.2')
describe RCVD_IN_BRBL   Received via relay listed in Barracuda 
RBL
score RCVD_IN_BRBL  1.0
tflags RCVD_IN_BRBL net

micah

Overriding user prefs in local.cf

2008-11-11 Thread Micah Anderson


I set some 'add_header' options in my global local.cf and could not
figure out why they were not being applied. It turns out that because I
am using SQL user_prefs, any add_header lines I put in local.cf are just
ignored (even though I have no global or individual add_header lines
configured in my sql table).

Is there any documentation that details which options that I might
configure in local.cf that are overridden by user prefs simply existing?

I know I can set a @GLOBAL pref with these add_header lines if I wish,
and I can set them for my user, but I thought that by setting them in my
local.cf they would be honored globally as well, as certain other things
that are set there are honored globally. I'm not sure which are and
which are not.

micah

Re: Funds / Award release scams poor scoring

2008-11-12 Thread Micah Anderson

* Justin Mason <[EMAIL PROTECTED]> [2008-11-12 05:20-0500]:
> 
> John Hardin writes:
> > On Sun, 9 Nov 2008, Micah Anderson wrote:
> > 
> > > Does anyone have any rules to catch these, or suggestions of scores to
> > > tweak to make these hit better?  I am running clamav-milter with the
> > > sanesecurity add-ons, but these are still making it through.
> > 
> > Check out the sought-fraud ruleset.
> > 
> > http://svn.apache.org/viewvc/spamassassin/rules/trunk/sandbox/jm/20_sought_fraud.cf
> > 
> > (I don't know if it's in sa-update yet - Justin?)
> 
> That's in sa-update since last night; it's now bundled in the main
> "sought" ruleset channel, as well.

Which channels specifically? Do you mean to say that it is in both:

updates.spamassassin.org
sought.rules.yerp.org

now?

Thanks!
Micah


signature.asc
Description: Digital signature

Re: Overriding user prefs in local.cf

2008-11-12 Thread Micah Anderson

Matt Kettler <[EMAIL PROTECTED]> writes:

> Micah Anderson wrote:
>> I set some 'add_header' options in my global local.cf and could not
>> figure out why they were not being applied. It turns out that because I
>> am using SQL user_prefs, any add_header lines I put in local.cf are just
>> ignored (even though I have no global or individual add_header lines
>> configured in my sql table).
>>   
> That's strange. They should only be ignored if the user prefs contains a
> clear_headers, or if it has an add_header for the exact same header.
>
> Does your user_prefs or global contain a clear_headers command?

No, thats why I was confused as well. My global prefs don't exist in SQL
at all, and my user prefs do not contain either an add_headers or
clear_headers command. 

>> Is there any documentation that details which options that I might
>> configure in local.cf that are overridden by user prefs simply existing?
>>   
> There are none that are cleared simply by the merits of user_prefs
> existing. An empty prefs is the same as no prefs.

Ok, thats how I expected things to work, clearly something else is going
on then.

thanks,
micah

hostkarma junkemailfilter

2008-11-16 Thread Micah Anderson


Over at another post about Phishing[0], Brent suggested setting up
hostkarma.junkemailfilter to my RBL list, which I have done... However
it seems to hit a lot of spams giving them a -5 scoring. I've either got
this configured backwards, or this isn't working very well because it
whitelists too much actual spam. I copied the examples[1] directly from
their wiki...

Does anyone have any experience with these? I'm removing the JMF-WHITE
because its not helping at all, but I wonder if others have experience?

header __RCVD_IN_JMF 
eval:check_rbl('JMF-lastexternal','hostkarma.junkemailfilter.com.')
describe __RCVD_IN_JMF Sender listed in JunkEmailFilter
tflags __RCVD_IN_JMF net
 
header RCVD_IN_JMF_W eval:check_rbl_sub('JMF-lastexternal', '127.0.0.1')
describe RCVD_IN_JMF_W Sender listed in JMF-WHITE
tflags RCVD_IN_JMF_W net nice
score RCVD_IN_JMF_W -5
 
header RCVD_IN_JMF_BL eval:check_rbl_sub('JMF-lastexternal', '127.0.0.2')
describe RCVD_IN_JMF_BL Sender listed in JMF-BLACK
tflags RCVD_IN_JMF_BL net
score RCVD_IN_JMF_BL 3.0
 
header RCVD_IN_JMF_BR eval:check_rbl_sub('JMF-lastexternal', '127.0.0.4')
describe RCVD_IN_JMF_BR Sender listed in JMF-BROWN
tflags RCVD_IN_JMF_BR net
score RCVD_IN_JMF_BR 1.0

0. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/113625
1. http://wiki.junkemailfilter.com/index.php/Spam_DNS_Lists

micah

Re: Funds / Award release scams poor scoring

2008-11-18 Thread Micah Anderson

mouss <[EMAIL PROTECTED]> writes:

> Henrik K wrote:
>> On Mon, Nov 10, 2008 at 08:49:00AM +0100, mouss wrote:
>>> Henrik K wrote:
 On Mon, Nov 10, 2008 at 12:25:42PM +0530, ram wrote:
> The number of DNSWL_LOW and DNSWL_MED misfires have gone up especially
> in last two days. Even Marc's JMF_W misfires. 
>
> What it means is these are "good" mailservers who normally relay ham and
> have some weak links ( weak password etc ) that just got exposed
 What method are they using to relay through master.debian.org? I can't
 figure out how these mail from yahoo etc can end up relaying through there
 in this case.

>>> they simply post to the list. if the list is not open, they
>>> susbcribe  first.
>>
>> Ah right, I was looking it a bit wrong.. it's silly that the original
>> recipient is nowhere to be found in headers.
>>
>
> Now that you say it, I don't see any list headers! so it looks like a
> bug somewhere...

No, I receive email at [EMAIL PROTECTED], so it doesn't need to go
through a debian list to get to me.

micah

Distributing the processing load

2008-11-18 Thread Micah Anderson


Our poor spamassassin machine is not able to keep up with the mail
load. We are constantly getting "prefork: server reached --max-children
setting, consider raising it" errors, and our max-children are already
set at the max that this machine can handle (50). 

Since we are using spamc/spamd I figured that it would be trivial to
setup a second spamd on another machine and then the load could be
split. I accomplished this by setting my mailfilter to use '-d spamd'
and configured the spamd host in my DNS to be a round-robin between the
two participating IPs. However, this seems to only work as a
'fail-over', and not a load-balancer, as the spamc man page says:

   If host resolves to multiple addresses, then spamc will
   fail-over to the other addresses, if the first one cannot be
   connected to.  It will first try all addresses of one host
   before it tries the next one in the list.  

In fact, looking at my logs, one of the spamd machines is only
processing requests for one of the three mail servers, the other
requests are going to the other spamd. Likely this is because they all
looked up the address, and then have it cached?

I am using -x, and the man page says that the fail-over behaviour is
incompatible with -x; if that switch is used, fail-over will not occur.
Thats fine, I'm not particularly interested in fail-over, but rather
load-balancing, is there any way to do this without having to setup my
different mail servers to query different spamds?

Thanks for any ideas,
micah

Re: hostkarma junkemailfilter

2008-11-20 Thread Micah Anderson

"Benny Pedersen" <[EMAIL PROTECTED]> writes:

> On Tue, November 18, 2008 22:16, Henrik K wrote:
>
> postfwd and trusted_networks msa_networks is what i do use here, then minimal
> dns lookups is needed olso, facebook have random helo so need to be
> whitelisted hard in postfwd and in spamassassin, i have contacted facebook
> about it, but the problem might still be there
>
> i like your postfwd config

Where is this postfwd config you refer to? I would like to see this.

micah

Local rules math problem

2009-05-02 Thread Micah Anderson


I've got a couple custom meta rules, that don't seem to be applying how
I expected them to.

When I run a message that should hit on these rules I get:

[14109] dbg: rules: ran one_line_body rule __LOCAL_PHISHER_USERNAME ==> got 
hit: "Username:"
[14109] dbg: rules: ran one_line_body rule __LOCAL_PHISHER_PASSWORD ==> got 
hit: "Password:"
[14109] dbg: rules: ran header rule __LOCAL_REPLYTO_NOTUS ==> got hit: 
"negative match"

Which results in the rule: LOCAL_PHISH_FROMREPLY getting set with score
0.1, which is great, that is what I expect. However there is a rule that
builds on that which doesn't fire, specifically the
LOCAL_PHISHER_USERPASS rule which does the math to add the
LOCAL_PHISH_FROM_REPLY to the __LOCAL_PHISHER_PASSWORD and
__LOCAL_PHISHER_USERNAME to get over a score of 1, but even though those
rules fire, the math addition doesn't seem to get over 1 and thus the
meta rule doesn't fire...

what am I missing here?

body __LOCAL_PHISHER_PASSWORD   /Password(.{0,10}\([\s\.\*\_]+\)|( 
.{0,4})?:)/i

header __LOCAL_RETURN_PATH_ISUS Return-Path =~ /\...@ourdomain\.net/
header __LOCAL_FROM_ISUSFrom =~ /\...@ourdomain\.net/
header __LOCAL_REPLYTO_EXISTS   exists:Reply-To
header __LOCAL_REPLYTO_NOTUSReply-to !~ /\...@ourdomain\.net/
meta LOCAL_PHISH_FROMREPLY(( __LOCAL_RETURN_PATH_ISUS || 
__LOCAL_FROM_ISUS ) && ( __LOCAL_REPLYTO_EXISTS && __LOCAL_REPLYTO_NOTUS ))
score LOCAL_PHISH_FROMREPLY 0.1

body __LOCAL_PHISHER_USERNAME   
/User(\s)?(n|N)ame(.{0,10}\([\s\.\*\_]+\)|( .{0,4})?:)/i
meta LOCAL_PHISHER_USERPASS ((( 0.2 * __LOCAL_PHISHER_USERNAME ) + 
( 0.4 * __LOCAL_PHISHER_PASSWORD ) + ( 0.4 * LOCAL_PHISH_FROMREPLY)) > 1)
describe LOCAL_PHISHER_USERPASS Typical phish: asks for username and 
password, we dont do that
score LOCAL_PHISHER_USERPASS10.5

thanks,
micah

bayes training doesn't seem to have any affect

2009-05-02 Thread Micah Anderson


I got a phish message that was understood by bayes as:

-2.6 BAYES_00   BODY: Bayesian spam probability is 0 to 1%
[score: 0.]

So I traiend with spamc -L spam but even after that I am still getting
BAYES_00. Shouldn't the training have bumped that score up?

Thanks for any info,
micah

Re: bayes training doesn't seem to have any affect

2009-05-03 Thread Micah Anderson

Dave Walker  writes:

> Micah Anderson wrote:
>> I got a phish message that was understood by bayes as:
>>
>> -2.6 BAYES_00   BODY: Bayesian spam probability is 0 to 1%
>> [score: 0.]
>>
>> So I traiend with spamc -L spam but even after that I am still getting
>> BAYES_00. Shouldn't the training have bumped that score up?
>>
>> Thanks for any info,
>>
> In order for Bayes to actually make a difference, it needs plenty of
> training.  It's disabled by default in most installs - unless you have
> at least 200 of both spam and ham taught.  This needs to be done
> manually, unless you have autolearn enabled.

Yeah, I've been running this bayes db for a couple years now, so I am
sure I've passed the 200 mark :)

I'm wondering if my bayes DB is too poisoned now and maybe needs to be
reset?

> To see what is really going on run "$ spamassassin -D <
> /path/to/the/email > /dev/null", and see if you can learn anything as to
> why it's not working as expected.

Indeed, when I do this, I find these bayes related log entries:

[13244] dbg: bayes: corpus size: nspam = 6798614, nham = 19136735
[13244] dbg: bayes: tok_get_all: token count: 175
[13244] dbg: bayes: score = 0

> Also, to see how experienced your Bayes knowledge is - use "$ sa-leanrn
> --dump magic"

This shows me that I have no idea what these magic things are :) Does
this tell you anything useful? 

0.000  0  3  0  non-token data: bayes db version
0.000  06798614  0  non-token data: nspam
0.000  0   19136753  0  non-token data: nham
0.000  0 1063157695  0  non-token data: ntokens
0.000  0 1241301616  0  non-token data: oldest atime
0.000  0 1241416889  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal sync atime
0.000  0 1241344830  0  non-token data: last expiry atime
0.000  0  43200  0  non-token data: last expire atime delta
0.000  0 496607  0  non-token data: last expire reduction 
count

micah

Re: bayes training doesn't seem to have any affect

2009-05-05 Thread Micah Anderson

Adam Katz  writes:

> Micah Anderson wrote:
>>> Also, to see how experienced your Bayes knowledge is - use "$ sa-leanrn
>>> --dump magic"
>> 
>> This shows me that I have no idea what these magic things are :) Does
>> this tell you anything useful? 
>> 
>> 0.000  0  3  0  non-token data: bayes db version
>> 0.000  06798614  0  non-token data: nspam
>> 0.000  0   19136753  0  non-token data: nham
>> 0.000  0 1063157695  0  non-token data: ntokens
>> 0.000  0 1241301616  0  non-token data: oldest atime
>> 0.000  0 1241416889  0  non-token data: newest atime
>> 0.000  0  0  0  non-token data: last journal sync 
>> atime
>> 0.000  0 1241344830  0  non-token data: last expiry atime
>> 0.000  0  43200  0  non-token data: last expire atime 
>> delta
>> 0.000  0 496607  0  non-token data: last expire 
>> reduction count
>
> Eh?  Last journal sync atime is Jan 1 1970?
> Try running:   sa-learn --sync

Doesn't seem to change the 'last journal sync atime' from 0.

> If that helps, put it in your nightly SpamAssassin cron job
> (and/or revisit your custom teaching scripts).

In fact, I've been running that from cron every night. 

I'm using a mysql DB and I've got the following set in my local.cf:

# We want to expire via cronjob, rather than having one of our spamd
# children do it. 
bayes_auto_expire  0

# no affect
bayes_learn_to_journal 0

> A quick primer (since this doesn't really exist anywhere...):  The
> three zeroed columns are always zero.
>
> bayes db version is self-explanatory.
> nspam is the number of spam messages on record.  bayes needs >200.

Should be fine: 6798649

> nham is the number of ham messages on record.  bayes needs >200.

Also should be fine: 19160960

> ntokens is the number of 'words' noted in the system.

lots of tokens: 1065483803

> oldest atime is the oldest access time of the oldest token (I think).

I've got 1241474416 which would be Mon May  4 15:00:16 PDT 2009
which is just yesterday... that doesn't seem right that this would be
the oldest access time, especially for 1065483803 tokens!

> the rest of the times should be self-explanatory.
> last expire reduction count is the number of tokens removed from the
> last expiration run (I think).

Ok, that seems to be counting, so something is being expired:

0.000  0 840628  0  non-token data: last expire reduction 
count

This is all very interesting info, I appreciate the
explanation. However, my original question still stands.

micah

Re: bayes training doesn't seem to have any affect

2009-05-05 Thread Micah Anderson

Karsten Bräckelmann  writes:

>> This shows me that I have no idea what these magic things are :) Does
>> this tell you anything useful? 
>
>> 0.000  06798614  0  non-token data: nspam
>> 0.000  0   19136753  0  non-token data: nham
>
> That's quite a lot of ham compared to the spam... Does that really
> reflect your mail instream?

I would suspect not, since we probably get more spam than
non-spam. However, perhaps the spamassassin autolearning caused this?

Perhaps the DB is so out of whack, I should just reset it from scratch
and try it again. Its a lot of data to loose and I am not sure exactly
the right way to do that... so I'd be somewhat reluctant to do so. Might
be better if I could clean it out some.

> 19 M hams learned and an SQL Bayes storage backend. Site wide. Do you
> trust your users? Any chance some of them are training badly? At worst

No, I don't trust my users. In fact because of that we moved from doing
site-wide training to selected users who can demonstrate that they
understand how to train. Perhaps these numbers are legacy from before we
switched to this method.

thanks,
micah

Re: Low scores

2010-03-11 Thread micah anderson

On Tue, 9 Mar 2010 11:56:56 -1000, Julian Yap  wrote:
> Just wanted to add that this particular line is incorrect:
> meta SC_HAM (USER_IN_WHITELIST||USER_IN_DEF_WHITELIST||
> USER_IN_ALL_SPAM_TO||NO_RELAYS||ALL_TRUSTED||USER_IN_BLACKLIST_TO||
> USER_IN_BLACKLIST)
> 
> That will have Blacklisted email filters classified as ham.

Interesting, thanks for the reply from an old thread. 

I got this list from:
http://wiki.apache.org/spamassassin/ShortcircuitingRuleset which seems
to be something that Justin Mason put together. I have CC'd Justin on
this email.

This list specifies that this was a good shortcircuit rule to have first
because these are non-network-based whitelists, locally-generated
messages, messages via a trusted relay chain, simple non-network based
blacklists.

Mine now reads:

meta SC_HAM 
(USER_IN_WHITELIST||USER_IN_DEF_WHITELIST||USER_IN_ALL_SPAM_TO||SUBJECT_IN_WHITELIST||NO_RELAYS||ALL_TRUSTED||USER_IN_BLACKLIST_TO||USER_IN_BLACKLIST||SUBJECT_IN_BLACKLIST)
priority SC_HAM -1000
shortcircuit SC_HAM ham
score SC_HAM -20

Which has the difference of also including "SUBJECT_IN_WHITELIST", and
"SUBJECT_IN_BLACKLIST"... but now I am wondering if this is the right
thing to do.

I'm very curious about resolving this, it does seem like a bad setup and
it is being taken as gospel from the spamassassin wiki, but perhaps
there is something that we are not understanding here that Justin can
clarify?

micah

pgpPzA62WWh7c.pgp
Description: PGP signature

Re: Low scores

2010-03-17 Thread micah anderson

On Fri, 12 Mar 2010 15:44:21 -1000, Julian Yap  wrote:
> On Thu, Mar 11, 2010 at 7:58 AM, micah anderson  wrote:
> 
> > On Tue, 9 Mar 2010 11:56:56 -1000, Julian Yap 
> > wrote:
> > > Just wanted to add that this particular line is incorrect:
> > > meta SC_HAM (USER_IN_WHITELIST||USER_IN_DEF_WHITELIST||
> > > USER_IN_ALL_SPAM_TO||NO_RELAYS||ALL_TRUSTED||USER_IN_BLACKLIST_TO||
> > > USER_IN_BLACKLIST)
> > >
> > > That will have Blacklisted email filters classified as ham.
> >
> > Interesting, thanks for the reply from an old thread.
> >
> > I got this list from:
> > http://wiki.apache.org/spamassassin/ShortcircuitingRuleset which seems
> > to be something that Justin Mason put together. I have CC'd Justin on
> > this email.


> > Which has the difference of also including "SUBJECT_IN_WHITELIST", and
> > "SUBJECT_IN_BLACKLIST"... but now I am wondering if this is the right
> > thing to do.

I actually removed the SUBJECT_IN rules as this makes it so any
individual user who can whitelist/blacklist a subject can shortcircuit
for everyone.

> > I'm very curious about resolving this, it does seem like a bad setup and
> > it is being taken as gospel from the spamassassin wiki, but perhaps
> > there is something that we are not understanding here that Justin can
> > clarify?
> >
> 
> I'm pretty sure yours is wrong.  You need to take out the the rules which
> apply to Spam in spam short circuiting.

I agree with you, its amazing that this has been wrong on the wiki since
2007! I went to go update the wiki today, and found that you had just
done it. Thanks for doing that!

Micah


pgpBuehAyiHwT.pgp
Description: PGP signature

Botnet plugin still relevant?

2010-03-17 Thread Micah Anderson


Hi,

I've been using the Botnet plugin version 0.8 for some time now, and the
plugin itself has been around since 2003 or so. I'm just curious to test
the waters and see what other's think about the relevance in 2010 of
this plugin. Does it still contribute in positive ways to your setup? I
do not see a newer version of the plugin since 2007, is there a newer
version than 0.8?

Did you do any configuration of it beyond its defaults? Does the
proliferation of individuals on dynamically assigned cable/dsl modems
cause the plugin to misfire too often?

I've had a number of complaints somewhat recently about the last point,
and I don't have much of a solution to the situation where a user is
stuck with the dynamically assigned IP that previously a spammer was
occupying, except to explain that is the situation and eventually it
will change.

thanks for any thoughts or experiences with this plugin!

micah

ps. I notice it is not listed on
http://wiki.apache.org/spamassassin/CustomPlugins and I wonder the
reason why?

sa-update channels

2010-03-17 Thread Micah Anderson


I'm trying to find out what the current state of the art is for plugins
and channel updates.

What are people using now days? I just reviewed my plugins and ended up
deleting Freemail because it has been pulled into Spamassassin core;
removed the postcards plugin because the original source is now 404 and
it is a very old rule; removed the iXhash plugin because it was spewing
a lot of perl errors and I was not seeing a lot of hits.

I've still got 20_saught_fraud, Botnet, and PDFinfo... but nothing
beyond that. 

For channels I've been using:

updates.spamassassin.org
sought.rules.yerp.org
saupdates.openprotect.com 

But I wonder if the last two are still relevant, or if there are other
lists to use instead?

Thanks for any advice,
micah

Re: Botnet plugin still relevant?

2010-03-22 Thread micah anderson

On Wed, 17 Mar 2010 14:45:53 -0700, John Rudd  wrote:
> Some people need to put in some alternate values for DNS timeouts, but
> if you've got a local caching name server, you typically don't need
> that.
> 
> There aren't any actual bugs in it that I'm aware of, so I haven't
> released a new version.  As I see it, there isn't a need (and that is
> a somewhat controversial statement with some of the more opinionated
> people around here).
> 
> I do still see some things that get nailed by it ... but there's lots
> of those same hosts that get caught by the Spamhaus PBL.  So, it kind
> of depends on what you're doing with PBL and/or Zen, as to whether or
> not you need Botnet.   But, there are still plenty of things coming
> from that class of hosts, so if you don't use one, I'd definitely
> recommend using the other.

Yeah, I've been having problems recently which I think are related to me
using both Zen/PBL along with the Botnet plugin weighted to score level
5, even if I were to have it lower at 3 it would still be too much.

Many users are complaining and when I finally get some useful messages
with headers to analyze I am finding something like the following:

X-Spam-Report: 
*  3.3 RCVD_IN_PBL RBL: Received via a relay in Spamhaus PBL
*  [213.6.61.151 listed in zen.dnsbl]
*  1.0 RCVD_IN_BRBL RBL: Received via relay listed in Barracuda RBL
*  [213.6.61.151 listed in b.barracudacentral.org]
*  1.4 RCVD_IN_BRBL_LASTEXT RBL: RCVD_IN_BRBL_LASTEXT
*  [213.6.61.151 listed in bb.barracudacentral.org]
*  0.0 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP 
address
*  [213.6.61.151 listed in dnsbl.sorbs.net]
*  0.8 SPF_NEUTRAL SPF: sender does not match SPF record (neutral)
*  5.0 BOTNET Relay might be a spambot or virusbot
*  
[botnet0.8,ip=213.6.61.151,rdns=a61-151.adsl.paltel.net,maildomain=palnet.com,client,ipinhostname,clientwords]
*  1.0 RDNS_DYNAMIC Delivered to internal network by host with
*  dynamic-looking rDNS

This brings it over the 8 threshold, although it is a legitimate email
From a user who has unfortunately been saddled with a dynamic IP that
previously was used by a spammer. No amount of explanation to these
users about this is going to assuage their feelings, and there isn't
really anything that can be done by them. They can complain to their ISP
I guess, they could also find another ISP, but these are not
particularly productive steps towards resolving this problem.

I'm interested in other suggestions that I offer people as alternatives,
but until then I think I may need to remove Botnet from the equation. 

micah

pgpOYcMscG6vB.pgp
Description: PGP signature

meaning of child cleanup

2010-04-01 Thread Micah Anderson


Since upgrading to the new spamassassin, I'm seeing the following two
log entries related to cleanup of child PIDs:

1. Apr  1 08:26:38 spamd2 spamd[396]: spamd: handled cleanup of child
pid [31720] due to SIGCHLD: INTERRUPTED, signal 2 (0002)

2. Mar 28 18:00:15 spamd2 spamd[17562]: spamd: handled cleanup of child
pid [391] due to SIGCHLD: exit 0

If I were to guess, the second one seems to be when things are acting
right, the first one seems problematic, and I'm trying to determine what
causes it. The logs for that process aren't particularly interesting,
they are just like any others, with various prefork childstate entries:

Mar 28 06:25:35 spamd2 spamd[396]: prefork: child states: II
Mar 28 06:25:36 spamd2 spamd[396]: prefork: child states: IB

but nothing particularly egregious looking. 

Can someone help me clarify what causes an INTERRUPTED signal? Should I
worry about it? Should I ignore it in logcheck?

thanks!
micah



-- 
"It is no measure of health to be well adjusted to a profoundly sick society." 
- J Krishnamurti

dcc: [26896] terminated: exit 241

2010-04-12 Thread Micah Anderson


I'm getting a lot of these log entries ever since I've upgraded:

Apr  9 22:31:14 spamd2 spamd[2774]: dcc: [26896] terminated: exit 241

Obviously this is related to dcc, but I am not finding anything about
what 'exit 241' is, and how I can adjust things so I no longer get them
(or maybe they are normal and I need to start ignoring them?)

Does anyone have a clue about these? thanks!
micah


-- 
"It is no measure of health to be well adjusted to a profoundly sick society." 
- J Krishnamurti

New log errors on upgrading

2010-04-12 Thread Micah Anderson


More new errors that I am getting from an upgrade to spamassassin 3.3:

Use of uninitialized value $start_time in addition (+) at
/usr/sbin/spamd line 1382, 

and also the following:

spf: lookup failed: Can't locate object method "new_from_string" via
package "Mail::SPF::Mech::All" at /usr/share/perl5/Mail/SPF/Record.pm
line 227.

I'm using libmail-spf-perl version: 2.005-1

Might this be fixed in a newer perl version?

Micah

Re: dcc: [26896] terminated: exit 241

2010-04-15 Thread Micah Anderson

Michael Scheidell  writes:

> On 4/12/10 4:55 PM, Micah Anderson wrote:
>> I'm getting a lot of these log entries ever since I've upgraded:
>>
>> Apr  9 22:31:14 spamd2 spamd[2774]: dcc: [26896] terminated: exit 241
>>
>>
> what version of dcc are you running?

This is version '1.2.74-4' from Debian... but now looking closer, it
seems as if dcc was removed after Debian Etch. It seems that it was
removed because the upstream authors changed its license to non-free
(according to Debian's DFSG) in version 1.30. This also means that it
has not been available in Ubuntu either since Dapper.

"The Distributed Checksum Clearinghouse source carries a license that is
free to organizations that do not sell filtering devices or services
except to their own users and that participate in the global DCC
network. . . you may not redistribute modified, "fixed," or "improved"
versions of the source or binaries. You also can't call it your own or
blame anyone for the results of using it."

So I guess I just will remove dcc, that is a shame, it seems like a good
service.

> what did you upgrade?

Sorry, I upgraded from Debian etch to Debian Lenny, along with that came
an upgrade to spamassassin.

micah

-- 
"It is no measure of health to be well adjusted to a profoundly sick society." 
- J Krishnamurti

Re: New log errors on upgrading

2010-04-15 Thread Micah Anderson

Mark Martinec  writes:

>> More new errors that I am getting from an upgrade to spamassassin 3.3:
>
> 3.3.0 ?

Good question... indeed the version is 3.3.0.

>> Use of uninitialized value $start_time in addition (+) at
>> /usr/sbin/spamd line 1382, 
>
> That was fixed in 3.3.1 .

Great, I didn't see that in the changelog, but I'm sure it was. I will
update before I bug you further about these! :)

>> and also the following:
>> 
>> spf: lookup failed: Can't locate object method "new_from_string" via
>> package "Mail::SPF::Mech::All" at /usr/share/perl5/Mail/SPF/Record.pm
>> line 227.
>> 
>> I'm using libmail-spf-perl version: 2.005-1
>> 
>> Might this be fixed in a newer perl version?
>
> No idea. Try Mail-SPF-v2.007, the 2.005 is three years old.

I am now running v2.007 to see if that fixes it, I suspect it will. If
it does I will make sure the debian package gets that noted so others
wont run into this.

thanks for your answers,
micah

spamc randomization

2010-04-21 Thread Micah Anderson


I'm using the --randomize option to spamc, along with the -d switch that
has a hostname which resolves to multiple IP addresses. 

Does the --randomize get passed the full set of IPs that are resolved
from the -d hostname and then it randomizes those IPs? In otherwords,
you can have one host name (say 'spamd') which resolves to multiple IPs
and then passed to the --randomize to be picked from? That seems to be
how it is described, but I could be misinterpreting it.

The description of the --randomize option in the man page which says,
'the IP addresses returned for the hosts given by the -d switch', and
the -d switch says you can do this:

   If host resolves to multiple addresses, then spamc will
   fail-over to the other addresses, if the first one cannot be
   connected to.  It will first try all addresses of one host
   before it tries the next one in the list.  


I'm also a little unclear what the --randomize man section means when it
says, "it will try only three times though." Say the hostname 'spamd'
resolves to four IP addresses: 192.168.1.2, 192.168.1.3, 192.168.1.4,
192.168.1.5. After -d resolve that hostname into those IPs, they are
passed to the --randomize function, and one of those four is picked. The
first one doesn't respond, so then it tries another one, that fails, it
then tries a final one and then gives up (not trying all four)?

Did I read this right? I appreciate any second eyes on my interpretation
here. 

thanks,
micah

Re: sa-update channels

2010-04-21 Thread Micah Anderson

Kai Schaetzl  writes:

> Micah Anderson wrote on Wed, 17 Mar 2010 18:20:40 -0400:
>
>> saupdates.openprotect.com
>
> It's been said repeatedly on this list: don't use it.

Thanks, should I be using the sought.rules.yerp.org channel instead, or
some of the dostech ones?

micah

Re: How do I filter out phishing email?

2010-04-21 Thread Micah Anderson

Jari Fredriksson  writes:

> On 14.4.2010 18:57, yongke wrote:
>> 
>> Well, we send emails on behalf of clients, and so we are trying catch
>> phishing spam before they are sent out.  Since the email aren't sent yet, we
>> had to generate a mock email for SA.  The header in the example is what we
>> THINK the headers will be when they are actually sent out.
>> 
>> When you tried it with your SA, I assume you didn't change any headers?  If
>> that's the case, then it should still work.  I guess I didn't setup SA
>> correctly? 
>> 
>
> I did not change anything. And I think I have pretty default scores on
> the rules.
>
> I have following rule sets in my channels:
>

> 90_2tld.cf.sare.sa-update.dostech.net

In a previous thread[0], it was mentioned that you should not be using the
above channel (or 90_3tld.cf) because these files have been merged into
3.3.1 and are released as 20_aux_tlds.cf

micah


0. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/127703

Re: dcc: [26896] terminated: exit 241

2010-04-21 Thread Micah Anderson

Michael Scheidell  writes:

> On 4/15/10 5:35 PM, Micah Anderson wrote:
>> M
>> "The Distributed Checksum Clearinghouse source carries a license that is
>> free to organizations that do not sell filtering devices or services
>> except to their own users and that participate in the global DCC
>> network. . . you may not redistribute modified, "fixed," or "improved"
>> versions of the source or binaries. You also can't call it your own or
>> blame anyone for the results of using it."
>>
> Which seems silly for debian to remove it, since many of the
> blacklists in SA are by default, licensed similar (free for non
> commercial use, paid if > xxx queries).  maybe debian should look
> through and remove ALL 'dual licensed' software, and when you install
> SA from the RPM's, disable the dual licensed RBL's.

You misunderstand Debian's role and license guidelines. Debian is a
software distributor, and as such it is not silly for Debian to stop
distributing software (ie. dcc) when distributing that software violates
its rules. The blacklists enabled in SA by default are not software,
they are simply hostnames that the Spamassassin software
uses. Configured hostnames are not distribution restricted, and arguably
not even 'software'. There is no software distribution restriction
involved in having those blacklists enabled in SA that violates Debian's
software distribution terms. The software that is distributed is
Spamassassin, which has a fully compliant Debian software distribution
license, not the blacklists that are enabled by default in Spamassassin.

The blacklists do have a restricted use license, but that is something
else altogether.

The software 'dcc', is software, and with it carries a license which
restricts its distribution, and thus Debian, as a software distributor,
has to make decisions based on its own policy, if it is willing to
accept such a distribution restriction. Debian has the DFSG, which is
its guidelines for what is acceptable for distribution, and the license
that the software 'dcc' carries does not satisfy those criteria.

> Or, hey, lets pretend the people installing debian are smart enough to
> be able to make up their own mind if they fit the free license model.

People are free to do that, Debian wont distribute it for those people,
but people are free to put whatever they like on their systems.

> it IS a good service, and SA 3.3x supports the reputation query
> directly now in the commercial license.
> Some things to understand,  (normal language vs legal talk)

I believe it is a good service. If I could get updated software, with
security upgrades, from Debian, I would use it.

micah

Re: dcc: [26896] terminated: exit 241

2010-04-22 Thread Micah Anderson

Michael Scheidell  writes:

> On 4/21/10 1:25 PM, Ted Mittelstaedt wrote:
>>
>>
>> Distributed Checksum Clearinghouse quite obviously feels that they have
>> captured enough fishes in the ocean and are making plenty of money now
>> and so do not require all of the free advertising that inclusion of
>> their source in Debian gives them.  Quite obviously they complained
>> and
>> their stuff was withdrawn as a result.
>
> The DCC author  would welcome Debian replacing the old, broken code
> with something new.

That will only be accepted by Debian if the license were changed to be
DFSG compliant[0], at which point it would be gladly re-introduced into
Debian. I would even be happy to facilitate that process as a Debian
Developer.

> Or is it your debian folks just forgot to update it?

My previous message detailed why it wasn't updated[1], a message that
you replied to, more than once. Debian did not 'just forget to update
it', rather it seems that you were the one who forgot something (the
reason why it was not updated).

In fact the whole thread here has continued on as a result of that very
reason why Debian did not update it. I'll cite it again for you[2]

 "The Distributed Checksum Clearinghouse source carries a license that is
 free to organizations that do not sell filtering devices or services
 except to their own users and that participate in the global DCC
 network." 

This specifically violates DFSG #6.

Its also worth noting here that the original Debian maintainer expressed
frustration about the communication with upstream because, "he seemed to
blacklist several ip ranges, including master's main mail server and
murphy's [ed. note: these are Debian's mail servers] ip-range as well as
the ip-range i ussualy [sic] used for mailing. So neither mailing him
directly nor mailing to the mailing list was possible." [editor notes
mine]

> As was previously posted (by someone else) DCC is free for most
> everyone, including ISP's who use it in their mail servers to protect
> their own clients.

There is free as in money, and then there is free as in freedom (libre),
these are different things.

> So, put your money where your mouth is.  

So the money is there, now what?

> Why won't debian fix their broken RPM?  

Probably because Debian doesn't use RPMs... sorry I couldn't resist. The
real reason is the one cited here, and in previous messages.

> someone official from debian want to chime in?

Since I am a Debian Developer, I may count as 'official' here.

micah

0. http://www.debian.org/social_contract#guidelines
1. http://permalink.gmane.org/gmane.mail.spam.spamassassin.general/128332
2. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=380542

Re: dcc: [26896] terminated: exit 241

2010-04-22 Thread Micah Anderson

Ted Mittelstaedt  writes:

> Actually it's not even that.  The notion that Debian spent effort
> detecting and removing DCC source is rather farfetched.

Sorry, but you are pretty off here. Debian does this all the time. I'm
an official Debian Developer and I have personally been involved in
doing this a few times.

> Because Linux distros are so large, many freely available
> commercially-licensed apps - such as device drivers - some of which
> also do not carry "your allowed to distribute this" licenses, get
> "sucked up" into the distributions.

Unless you can find an example, you are making a specious argument. Do
you know the process to get software into Debian?

> Some of this happens by users contributing them and not reading the
> licensing closely enough, but quite a lot of it happens by commercial
> companies deliberately inserting their stuff in the distros.

First, 'users' do not contribute applications to Debian, that isn't how
it works. Secondly, even if an official Debian Developer (who actually
is the only person permitted to contribute things to the Debian archive)
happens to do as you assert and not read the licensing, then the Debian
FTP-masters, whose role it is to specifically determine if the Debian
Developer did their due diligence in checking the license restrictions,
would reject that package.

I guess the fact that I had to explain this answers my previous
question, you do not understand how software gets into Debian. I would
advise you to educate yourself before making arguments that by their
very nature demonstrate your misunderstanding, it weakens your argument.

[snip]

> It's also generally understood that if a commercial app seller
> doesen't like it they have the right to complain and get an immediate
> cessation of inclusion of their apps in a distro.  That is why I
> suspect happened
> here.

Sorry, but if a DFSG-licensed application is put in Debian, no
commercial app seller has any right to "complain and get an immediate
cessation of inclusion of their apps in a distro". It doesn't work that
way.

> Distributed Checksum Clearinghouse quite obviously feels that they have
> captured enough fishes in the ocean and are making plenty of money now
> and so do not require all of the free advertising that inclusion of
> their source in Debian gives them.  Quite obviously they complained
> and
> their stuff was withdrawn as a result.

Your conclusions are amazing, but that does not make them any more
right.

micah

Bayes timeouts and database handle being DESTROY'd without explicit disconnect

2010-10-19 Thread Micah Anderson


Hello,

I'm running a busy mail server. We've got a bayes database on its own
server, with InnoDB tables. 

I'm seeing a number of these entries in my log files and am struggling
to determine what could be causing them and how to fix them:

Oct 19 07:02:10 spamd3 spamd[27474]: learn: exceeded time limit in pms learn
Oct 17 06:30:12 spamd3 spamd[25651]: plugin: eval failed: bayes: (in learn) 
__alarm__ignore__(15190)
Oct 17 06:30:42 spamd3 spamd[25598]: plugin: eval failed: bayes: (in learn) 
child processing timeout at /usr/sbin/spamd line 1283,  line 185.

I get quite a few of these:

Oct 19 07:02:19 spamd3 spamd[18746]: Issuing rollback() for database handle 
being DESTROY'd without explicit disconnect() at 
/usr/share/perl5/Mail/SpamAssassin/Plugin/Bayes.pm line 1516,  line 2.

and a few of these, although not that many:

Oct 17 12:02:29 spamd3 spamd[6367]: prepare_cached(SELECT max(runtime) from 
bayes_expire WHERE id = ?) statement handle DBI::st=HASH(0xadbb060)still Active 
at /usr/share/perl5/Mail/SpamAssassin/BayesStore/SQL.pm line 722

Oct 19 05:33:13 spamd3 spamd[1630]: bayes: db_seen corrupt: value='1287482415' 
for 5d6fb52248450ee7528848c3a78b5a0650a24...@sa_generated, ignored at 
/usr/share/perl5/Mail/SpamAssassin/Plugin/Bayes.pm line 397,  line 
112.

thanks for any insights!
micha


pgpOWKtRHjXPz.pgp
Description: PGP signature

sa-learn --force-expire taking hours

2010-10-26 Thread Micah Anderson


I was investigating this morning why a number of spam messages were
coming through and found that they weren't scoring on bayes, because it
was unavailable. The database connection was working fine, but I noticed
that the nightly sa-learn --sync --force-expire had been running since
3am, which was 4 and a half hours ago:

root 26302  0.0  0.0   2440   892 ?Ss   03:00   0:00 /bin/sh -c 
sa-learn --sync --force-expire >/dev/null 2>&1
root 26305  0.0  0.0  35492  2528 ?S03:00   0:04 /usr/bin/perl 
-T -w /usr/bin/sa-learn --sync --force-expire

I connected to the database and did a 'show processlist\g' and found a
number of really long running processes:

| Id | User| Host| db| Command | Time   | State
| Info
|  66652 | spamass | 127.0.0.1:55248 | bayes | Query   | 355113 | Sending data 
| SELECT count(*)
   FROM bayes_token
  WHERE id = '5'
AND ati | 

a bunch of NULL processes (what are these?):

| 463898 | spamass | 127.0.0.1:41393 | bayes | Sleep   |  10592 |  
| NULL  
   

and a handful of 'rollback' processes:

| 474169 | spamass | 127.0.0.1:35973 | bayes | Query   |   1078 | NULL 
| rollback

Plus the various bayes processes that I expect, a sampling of which is below:

| 474756 | spamass | 127.0.0.1:34141 | bayes | Query   |472 | end  
| UPDATE bayes_token SET atime = '1288102083' WHERE id = '5' AND token IN 
('???-6','??,'R???','Xt | 
| 475050 | spamass | 127.0.0.1:48442 | bayes | Query   |  5 | Updating 
| UPDATE bayes_vars
  SET spam_count = spam_count + '1'
 WHERE id = '5'| 
| 475089 | spamass | 127.0.0.1:48669 | bayes | Query   |  0 | statistics   
| SELECT RPAD(token, 5, ' '), spam_count, ham_count, atime
 FROM bayes_token

Any ideas what could be going on, or steps I could take to troubleshoot
this?

Thanks!
micah

-- 



pgpkF4tD1yEOu.pgp
Description: PGP signature

Re: Bayes timeouts and database handle being DESTROY'd without explicit disconnect

2010-10-26 Thread Micah Anderson

Dominic Benson  writes:

> On 19 Oct 2010, at 17:05, Micah Anderson wrote:
>
>> 
>> Hello,
>> 
>> I'm running a busy mail server. We've got a bayes database on its own
>> server, with InnoDB tables. 
>
> What is your total DB size / server RAM? Could you include a snapshot of the 
> output of top from the DB server? I would guess that your problem is 
> indexing/tuning or server capacity MySQL side rather than in SA, but without 
> more data it is just a guess.

The databsae size is 2.74gig.

$ free
 total   used   free sharedbuffers cached
Mem:   805587668727401183136  0 5840325403916
-/+ buffers/cache: 8847927171084
Swap:  1959912 5694321390480

top - 07:26:39 up 10 days, 20:37,  1 user,  load average: 9.24, 6.80, 6.15
Tasks:  24 total,   2 running,  22 sleeping,   0 stopped,   0 zombie
Cpu(s): 83.3%us, 16.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:   8055876k total,  6890032k used,  1165844k free,   584364k buffers
Swap:  1959912k total,   569432k used,  1390480k free,  5405264k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND   
10744 mysql 20   0  655m 110m 5500 S  190  1.4   9296:14 mysqld 
10765 stunnel4  20   0  123m 109m 1416 S2  1.4 179:38.73 stunnel4   
1 root  20   0  1984  636  548 S0  0.0   2:40.15 init   
  397 bind  20   0 82856  23m 2632 S0  0.3   0:46.72 named  
 1812 root  20   0  3120 1176  772 S0  0.0   0:15.04 syslog-ng  
 3551 messageb  20   0  2488  648  488 S0  0.0   0:00.00 dbus-daemon
 3610 nobody20   0  6368 2668  888 S0  0.0   0:11.94 nagios-statd   
 4828 root  20   0  5484 1824 1476 S0  0.0   0:09.44 master 
10707 root  20   0  3784 1276 1076 S0  0.0   0:00.02 mysqld_safe
10745 root  20   0  2892  608  532 S0  0.0   0:00.00 logger 
10760 stunnel4  20   0  3836  688  348 S0  0.0   1:25.14 stunnel4   
10761 stunnel4  20   0  3836  692  352 S0  0.0   1:16.94 stunnel4   
10762 stunnel4  20   0  3836  692  352 S0  0.0   1:16.24 stunnel4   
10763 stunnel4  20   0  3836  692  352 S0  0.0   1:16.45 stunnel4   
10764 stunnel4  20   0  3836  692  352 S0  0.0   1:20.77 stunnel4   
11311 root  20   0  2044  888  704 S0  0.0   0:09.02 cron   
15444 postfix   20   0  5496 1788 1452 S0  0.0   0:00.00 pickup 

I'm averaging around 150 mysql threads, with peaks during peak mail
times. 

>> and a few of these, although not that many:
>> 
>> Oct 17 12:02:29 spamd3 spamd[6367]: prepare_cached(SELECT max(runtime) from 
>> bayes_expire WHERE id = ?) statement handle DBI::st=HASH(0xadbb060)still 
>> Active at /usr/share/perl5/Mail/SpamAssassin/BayesStore/SQL.pm line 722
>
>
> Try an EXPLAIN SELECT max(runtime) from bayes_expire WHERE id = ; 
> as you know it to be slow it might give a clue where to look to improve 
> performance. Or try turning the general query log on for a while and see what 
> queries are taking up time. MonYog is quite a nice frontend to this, but you 
> can do it by hand fairly simply.

mysql> EXPLAIN SELECT max(runtime) from bayes_expire WHERE id = 5;
++-+--+--+---+---+-+---+--+---+
| id | select_type | table| type | possible_keys | key  
 | key_len | ref   | rows | Extra |
++-+--+--+---+---+-+---+--+---+
|  1 | SIMPLE  | bayes_expire | ref  | bayes_expire_idx1 | 
bayes_expire_idx1 | 2   | const |  198 |   | 
++-+--+--+---+---+-+---+--+---+
1 row in set (0.00 sec)

Note, this might be related to the post I made today about sa-learn
--expire taking hours... 

micah

update channel list

2012-01-18 Thread Micah Anderson


I've had the following channel list for a while:

updates.spamassassin.org
sought.rules.yerp.org
khop-bl.sa.khopesh.com
khop-blessed.sa.khopesh.com
khop-general.sa.khopesh.com
khop-sc-neighbors.sa.khopesh.com

but I suspect that some of these are no longer good. I was hoping folks
out there might be able to make some suggestions for improvements?

thanks,
micah

-- 



pgpOebTBWqWzt.pgp
Description: PGP signature

Re: update channel list

2012-01-19 Thread Micah Anderson

dar...@chaosreigns.com writes:

> On 01/18, Micah Anderson wrote:
>> updates.spamassassin.org
>> sought.rules.yerp.org
>> khop-bl.sa.khopesh.com
>> khop-blessed.sa.khopesh.com
>> khop-general.sa.khopesh.com
>> khop-sc-neighbors.sa.khopesh.com
>> 
>> but I suspect that some of these are no longer good. I was hoping folks
>> out there might be able to make some suggestions for improvements?
>
> All of those are currently listed by Adam Katz on
> http://khopesh.com/wiki/Anti-spam
> I expect that list to be up to date.  
> He's an active spamassassin developer.  
>
> That page also lists 90_2tld.cf.sare.sa-update.dostech.net.  I doubt there
> are any others worth using.  If there are, they should probably get added
> to http://wiki.apache.org/spamassassin/CustomRulesets
> If there were more sa-update channels that were useful, I'd recommend
> breaking that page up a little more to put the rule sets with update
> channels at the top.  
>
> If you're looking to improve SA accuracy in general, I've tried to make a
> thorough checklist here:
> http://wiki.apache.org/spamassassin/ImproveAccuracy

Thanks, I'm going through that list to find anything that I dont have. 

I noticed that pyzor is recommended there. I had disabled it because it
seemed like it was no longer being developed. 

I am trying to get it enabled, but I am running into the issue reported
here: https://sourceforge.net/apps/trac/pyzor/ticket/163

I've requested a masscheck account, but am still waiting on that. 

I also noticed I didn't have these perl modules:

Jan 19 11:05:06.710 [17267] dbg: diag: [...] module not installed: 
IP::Country::Fast ('require' failed)
Jan 19 11:05:06.710 [17267] dbg: diag: [...] module not installed: 
IO::Socket::INET6 ('require' failed)

The INET6 one probably isn't necessary because I dont have ipv6 yet. I
couldn't find the IP::Country::Fast moduole as a debian package,
although there is libgeo-ipfree-perl, its unclear if that can be used
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=374879)... 

Everything else on that page, except for your mention of
90_2tld.cf.sare.sa-update.dostech.net I've done. 

thanks,
micah

--

trusted networks getting marked as spam

2014-10-24 Thread micah anderson


Hi,

I've got some machines that are running logcheck, they periodically send
mail to us with reports. Sometimes those mails have some spammy stuff in
them, because they are mail server logs, or web logs with some spammy
stuff in them. 

I don't want spamassassin to deal with these messages, I want them to
come through no matter what. I don't want them to contribute to bayes
scoring and I don't want them ever to end up as Spam.

Unfortunately, they are, it seems mostly because URIBL scores are
hitting before the SHORTCIRCUIT/ALL_TRUSTED stuff fires, so for example:

X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07)
X-Spam-Flag: YES
X-Spam-Status: Yes, score=8.1 required=6.0 tests=ALL_TRUSTED,SHORTCIRCUIT,
URIBL_AB_SURBL,URIBL_BLACK,URIBL_JP_SURBL,URIBL_WS_SURBL shortcircuit=ham
autolearn=disabled version=3.4.0

I've got the IP in trusted_networks, and internal_networks and I've got
a couple shortcircuit rules like as follows:

# simple, non-network-based whitelists, locally-generated messages,
# messages via a trusted relay chain, simple
meta SC_HAM 
(USER_IN_WHITELIST||USER_IN_DEF_WHITELIST||USER_IN_ALL_SPAM_TO||NO_RELAYS||ALL_TRUSTED)
priority SC_HAM -1000
shortcircuit SC_HAM ham
score SC_HAM -20

meta SC_SPAM (USER_IN_BLACKLIST_TO||USER_IN_BLACKLIST)
priority SC_SPAM -950
shortcircuit SC_SPAM spam
score SC_SPAM 20

shortcircuit ALL_TRUSTED on

yet, the high scoring due to the URIBLs caused this to get classified as
Spam.

How can I get around that?

Thanks!
micah

Issuing rollback() due to DESTROY without explicit disconnect() of DBD::mysql::db handle bayes

2015-09-23 Thread micah anderson


Hi,

I'm getting these errors in my log files, quite regularly:

Sep 23 21:58:16 towhee spamd[25561]: Issuing rollback() due to DESTROY without 
explicit disconnect() of DBD::mysql::db handle bayes:0.0.0.0 at 
/usr/share/perl5/Mail/SpamAssassin/Plugin/Bayes.pm line 1590,  line 2.

It appears that bayes is working, because I see logs like this:

Sep 23 22:02:19 towhee spamd[10768]: spamd: result: . -1 - 
AM_TRUNCATED,BAYES_00,CK_419SIZE,ENV_FROM_DIFF0,FORWARD_RELAY,HAS_REPLY_TO,HTML_MESSAGE,IP_REPEATING,MISSING_MID,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SUBJ_DATE
 
scantime=0.7,size=11555,uid=65534,required_score=5.0,rhost=0.0.0.0,raddr=0.0.0.0,rport=37464,mid=(unknown),bayes=0.000147,autolearn=disabled,shortcircuit=no

line 1590 is in the sub learner_new, but i have set in local.cf:

local.cf:bayes_auto_learn 0
local.cf:bayes_learn_to_journal0

It seems like the database is working fine...

any ideas?

thanks!
micah

MISSING_SUBJECT

2018-06-12 Thread micah anderson




I had a message marked with:

2.3 EMPTY_MESSAGE Message appears to have no textual parts and no
Subject:

It did not have a subject, but it did have content (although only
encrypted) it also hit:

*  1.8 MISSING_SUBJECT Missing Subject: header

which makes sense, because the mail did not have one, but have you
looked in your Spam folder lately? All spam has a subject, pretty much
always an informal survey of my trash heap showed 4 messages out of
400 did not have a Subject, and two of them were repeats.

-- 
micah

Re: MISSING_SUBJECT

2018-06-12 Thread micah anderson

Reindl Harald  writes:

> Am 13.06.2018 um 01:37 schrieb micah anderson:
>> I had a message marked with:
>> 
>> 2.3 EMPTY_MESSAGE Message appears to have no textual parts and no
>> Subject:
>> 
>> It did not have a subject, but it did have content (although only
>> encrypted) it also hit:
>> 
>> *  1.8 MISSING_SUBJECT Missing Subject: header
>> 
>> which makes sense, because the mail did not have one, but have you
>> looked in your Spam folder lately? All spam has a subject, pretty much
>> always
>
> no - there is ton of junk without a subject and sometimes even floods
> with no subject and no body at all

I believe you, however the message was not empty, it had encrypted
contents (and in fact was scored -1 because of that).

Re: MISSING_SUBJECT

2018-06-13 Thread micah anderson

Matus UHLAR - fantomas  writes:

> On 12.06.18 19:37, micah anderson wrote:
>>2.3 EMPTY_MESSAGE Message appears to have no textual parts and no
>>Subject:
>>
>>It did not have a subject, but it did have content (although only
>>encrypted) it also hit:
>>
>>*  1.8 MISSING_SUBJECT Missing Subject: header
>>
>>which makes sense, because the mail did not have one, but have you
>>looked in your Spam folder lately? All spam has a subject, pretty much
>>always an informal survey of my trash heap showed 4 messages out of
>>400 did not have a Subject, and two of them were repeats.
>
> and what is your point?

The point is EMPTY_MESSAGE scores even though it did have content. But I
guess the point is that it had no 'text' parts, because the content was
only pgp/mime?

-- 
micah

Re: MISSING_SUBJECT

2018-06-14 Thread micah anderson

John Hardin  writes:

> On Tue, 12 Jun 2018, micah anderson wrote:
>
>> I had a message marked with:
>>
>> 2.3 EMPTY_MESSAGE Message appears to have no textual parts and no
>> Subject:
>>
>> It did not have a subject, but it did have content (although only
>> encrypted)
>
> It may not be considering an encrypted message part to be a text body 
> part. What was the MIME type of that part?

pgp/mime

-- 
micah

Re: SA MySQL DB maintenance

2018-07-17 Thread micah anderson

"Kevin A. McGrail"  writes:

> I think Bayes should be in redis though not SQL.

Curious to know why you think that?

Understanding ruleQA results

2018-08-14 Thread micah anderson



Hi,

I'm trying to understand the ruleQA results because I'm trying to track
down how common the rule FRNAME_IN_MSG_NO_SUBJ is spammy.

I load the latest rules: 
http://ruleqa.spamassassin.org/20180813-r1837926-n/FRNAME_IN_MSG_NO_SUBJ/detail?s_corpus=1&s_g_over_time=1#overtime

and I see the S/O value is 1.0, which is a rule that hits only on spam
(a rule that only hits on ham is 0.0, a rule that doesn't anything is
0.5)... but how can I tell how many messages are part of the corpus?

Also, the percentages seem very low: 1.5192% Spam, and .0005%
Ham... 1.5% seems low to me to be adding 3.5 score to this rule, but
what do I know... which is why I'm asking.

thanks!


-- 
micah

Re: Understanding ruleQA results

2018-08-14 Thread micah anderson

John Hardin  writes:

> On Tue, 14 Aug 2018, micah anderson wrote:
>
>> but how can I tell how many messages are part of the corpus?
>
> As RW said, hover over the percentages.

Thanks.

>> Also, the percentages seem very low: 1.5192% Spam, and .0005%
>> Ham... 1.5% seems low to me to be adding 3.5 score to this rule, but
>> what do I know... which is why I'm asking.
>
> It's not so much the raw amount of spam it hits, it's that it hits spam 
> that few other rules hit, or that it hits spam that other rules hit but 
> that doesn't score high enough with those other rules.
>
> You also want to look at the score-map section when evaluating a rule.

Is there an explanation of the score-map section somewhere?

For this one it says:

  scoremap  ham:  0  33.33%1 *
  scoremap  ham:  1  66.67%2 **
  scoremap spam:  1   0.08%   15 
  scoremap spam:  3   0.61%  121 
  scoremap spam:  4  90.24% 17791 
  scoremap spam:  5   2.69%  531 *
  scoremap spam:  6   4.54%  896 *
  scoremap spam:  7   1.10%  217 
  scoremap spam:  8   0.26%   52 
  scoremap spam:  9   0.40%   79 
  scoremap spam: 10   0.01%2 
  scoremap spam: 11   0.05%9 
  scoremap spam: 14   0.01%2 

What are these columns and how can I interpret it?

> It's not so much the raw amount of spam it hits, it's that it hits spam 
> that few other rules hit, or that it hits spam that other rules hit but 
> that doesn't score high enough with those other rules.

I searched my pile of mail that I have from two ice ages ago, and I did
find 6 messages that were hits of this rule, one of them was spam, five
of them were this person trying to contact me. 

> Do you happen to be seeing FPs with this rule?

Yes, its why I am investigating it. I think it is common for people who
are sending mail from their mobiles, where they use it more like a quick
chat instead of a 'regular mail'

In fact, this person used:
X-Mailer: iPad Mail (15F79)


-- 
micah

Re: Understanding ruleQA results

2018-08-14 Thread micah anderson

John Hardin  writes:

> On Tue, 14 Aug 2018, RW wrote:
>
>> On Tue, 14 Aug 2018 13:24:47 -0700 (PDT)
>> John Hardin wrote:
>>
>>> On Tue, 14 Aug 2018, micah anderson wrote:
>>>
>>
>>>> I searched my pile of mail that I have from two ice ages ago, and I
>>>> did find 6 messages that were hits of this rule, one of them was
>>>> spam, five of them were this person trying to contact me.
>>>
>>> ...without a subject?
>>>
>>>>> Do you happen to be seeing FPs with this rule?
>>>>
>>>> Yes, its why I am investigating it. I think it is common for people
>>>> who are sending mail from their mobiles, where they use it more
>>>> like a quick chat instead of a 'regular mail'
>>>>
>>>> In fact, this person used:
>>>> X-Mailer: iPad Mail (15F79)
>>>
>>> OK, I can see about adding some mobile MUA exclusions. Any FP headers
>>> you can provide (directly) will be helpful. Go ahead and sanitize the
>>> recipient info, I don't think that would be relevant to tuning this
>>> one.

I'll provide some pastebin links in a separate email.

>> I don't know that this is particularly specific to mobile, lots of
>> people send emails with an empty subject.
>>
>> It sounds like the main cause would be a signature that contains the
>> senders name as the only thing in a line. That'll be why all the
>> FPs mentioned above came from the same person.

Yes, this person has as their signature their name on one line, and
their From: has that same name listed.

> Question: were those messages scored as spam?

yes, they were, will include the reports in the off-list email.

-- 
micah

Re: Understanding ruleQA results

2018-08-14 Thread micah anderson

John Hardin  writes:

> On Tue, 14 Aug 2018, micah anderson wrote:
>
>> John Hardin  writes:
>>
>>> On Tue, 14 Aug 2018, micah anderson wrote:
>
> OK, I can see about adding some mobile MUA exclusions. Any FP headers you 
> can provide (directly) will be helpful. Go ahead and sanitize the 
> recipient info, I don't think that would be relevant to tuning this one.

I put 4 of the messages here:

https://pastebin.com/YuPtBQXN

thanks for your help!

micah

Re: Current update channels

2018-09-20 Thread micah anderson

"Kevin A. McGrail"  writes:

> There are people asking me to put KAM.cf under the default sa-update
> crypto signature.  Technically, it's easy.  But it would have to be
> carefully considered as it's not a project ruleset.  Thoughts on that?

I would be interested in KAM as part of an update channel, it would make
updates more frequent. The only thing is I have to adjust KAM each time
I update it. For example, the political spam section is a bit dated and
has caused some frustrations for people.

-- 
micah

multiplying in rules

2018-11-20 Thread micah anderson



I was doing multiplication in rules to add scores, like this:

meta LOCAL_EXCEEDED_PHISH (((0.4 * __MAILBOX) + (0.4 * __LOCAL_EXCEEDED) + (0.4 
* __LOCAL_STORAGE) + (0.4 * __LOCAL_LIMIT)) > 1)

but now when I run spamassassin --lint, I'm told things like this:

Nov 20 09:34:42.096 [11146] warn: config: Strange rule token: 0.4

What should I do to fix that?

Thanks!

-- 
micah

Re: multiplying in rules

2018-11-20 Thread micah anderson

RW  writes:

> On Tue, 20 Nov 2018 12:38:24 -0500
> micah anderson wrote:
>
>> I was doing multiplication in rules to add scores, like this:
>> 
>> meta LOCAL_EXCEEDED_PHISH (((0.4 * __MAILBOX) + (0.4 *
>> __LOCAL_EXCEEDED) + (0.4 * __LOCAL_STORAGE) + (0.4 * __LOCAL_LIMIT))
>> > 1)
>> 
>> but now when I run spamassassin --lint, I'm told things like this:
>> 
>> Nov 20 09:34:42.096 [11146] warn: config: Strange rule token: 0.4
>
> It's the decimal fractions. 
>  
>> What should I do to fix that?
>
> It should be fixed in the next release.

ok, but until then, is the only option for me to disable these rules?
These are particularly important rules for stopping phishing attacks, so
I'd like to not disable them, but find some other kind of work around!


-- 
micah

Re: multiplying in rules

2018-11-20 Thread micah anderson

RW  writes:

> On Tue, 20 Nov 2018 12:53:18 -0500
> micah anderson wrote:
>
>> RW  writes:
>> 
>> > On Tue, 20 Nov 2018 12:38:24 -0500
>> > micah anderson wrote:
>> >  
>> >> I was doing multiplication in rules to add scores, like this:
>> >> 
>> >> meta LOCAL_EXCEEDED_PHISH (((0.4 * __MAILBOX) + (0.4 *
>> >> __LOCAL_EXCEEDED) + (0.4 * __LOCAL_STORAGE) + (0.4 *
>> >> __LOCAL_LIMIT))  
>> >> > 1)  
>> >> 
>> >> but now when I run spamassassin --lint, I'm told things like this:
>> >> 
>> >> Nov 20 09:34:42.096 [11146] warn: config: Strange rule token: 0.4  
>> >
>> > It's the decimal fractions. 
>> >
>> >> What should I do to fix that?  
>> >
>> > It should be fixed in the next release.  
>> 
>> ok, but until then, is the only option for me to disable these rules?
>> These are particularly important rules for stopping phishing attacks,
>> so I'd like to not disable them, but find some other kind of work
>> around!
>
> I don't believe it prevents the rule from working.

It prevents sa-compile from running because spamassassin --lint fails.

> What it does do is prevent compiled rules from being installed. But as I
> said it's the decimal fractions that cause it to fail and the above
> rule doesn't need to contain decimal fractions.

How can I do it without the fractions?

I've applied the patch from the repo to make it work.
-- 
micah

Re: multiplying in rules

2018-11-20 Thread micah anderson

"Bill Cole"  writes:

> On 20 Nov 2018, at 13:53, John Hardin wrote:
>
>> On Tue, 20 Nov 2018, micah anderson wrote:
> [...]
>>>> What it does do is prevent compiled rules from being installed. But 
>>>> as I
>>>> said it's the decimal fractions that cause it to fail and the above
>>>> rule doesn't need to contain decimal fractions.
>>>
>>> How can I do it without the fractions?
>>
>> Multiply everything by 10:(__rulename * 4) ...etc... > 10
>
> Or replace every decimal fraction with an integer division, so '0.4' 
> becomes '(4 / 10)'

oh, of course. I was thinking that these amounts contributed to the
score, but they do not. Thanks for wiping away the grime from my brain.


-- 
micah

Re: Scoring by registrar?

2019-07-01 Thread micah anderson

Grant Taylor  writes:

>> A very large number (nearly all, in fact) of the spams I receive these 
>> days involve domains registered with Namecheap. I've received hundreds 
>> of spams involving .icu domains from what appear to be the same spammer. 
>> I also receive a large number of scams impersonating Bitmain, again 
>> using domains involving Namecheap.
>
> Is Namecheap just the registrar?  Or are they also hosting the DNS service?

As a Namecheap customer, you are making me want to move. That is good,
but its also something you should consider, before you block the entire
registrar: there are a significant number of non-spamming Namecheap
customers that you would be cutting off if you did this. I understand
you want to put pressure on Namecheap, but the flip side of that is you
will be cutting yourself off from those domains in the process.

>> While Namecheap does suspend at least some domains within days of their 
>> being used in a campaign, it's clear that these are being treated as 
>> single-use domains, so this has very little impact on the spammers.

This sounds like Fast Flux - and it is not something that happens only
on Namecheap.

> I think there are also lists of domains that have been recently 
> registered.  Which might help if the single use domains were recently 
> registered.

Having such a list would be very helpful for dealing with fast flux.

-- 
micah

Re: Scoring by registrar?

2019-07-01 Thread micah anderson

Sean Lynch  writes:

>>Having such a list would be very helpful for dealing with fast flux.
>
> SA already has this. It used fresh.fmb.la to detect domains registered within 
> the past couple of weeks.

It does? Do I need to enable something to get that?
-- 
micah

Re: Spamhaus Technology contributions to SpamAssassin

2019-07-03 Thread micah anderson

Giovanni Bechis  writes:

> On 7/3/19 7:11 PM, Riccardo Alfieri wrote:
>> On 03/07/19 17:59, atat wrote:
>> 
>>> You say in documentation:
>>>
>>>  You should also drop, by default, all Office documents with macros.
>>>
>>> What plugin / method do You reccomend for that ?
>> 
>> I'm no expert in detecting macros, but there at least two ways of doing that 
>> that comes to mind:
>> 
>> - Clamav with the option OLE2BlockMacros

Reading up on OLE2BlockMacros in clamav, I'm very confused by
https://www.mail-archive.com/clamav-users@lists.clamav.net/msg42671.html

Specifically:

Setting 'OLE2BlockMacros Yes' effectively causes
'Heuristics.OLE2.ContainsMacros' to be returned, and disables all
official and unofficial signatures.

When 'OLE2BlockMacros Yes' this causes 'Heuristics.OLE2.ContainsMacros'
to be returned first and all other signatures that are not against
uncompressed macros are ignored. You only get one signature back and
that is the first one hit, which may be a 'soft' signature ie one you
mightn't discard an email on, such as Heuristics.OLE2.ContainsMacros,
even though 'hard' signatures official or unofficial might also have hit
if they had been run later .

> This has been superseded by 
> https://svn.apache.org/repos/asf/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/OLEMacro.pm
> the plugin is for trunk but it works out of the box in 3.4.3rc3 as well (some 
> work is needed to let it work on 3.4.2)

Can't these be blocked at the MTA level to be much more CPU friendly?

-- 
micah

Spoofed From: names

2020-04-09 Thread micah anderson



Hi,

What is the current state of the art for dealing with tricking people in
the From with the "Name" part? For example:

From: "supp...@example.com"

The "Real Name" part is used to put a fake email address of the actual
domain (example.com would be my domain, or gmail.com or something other
than air-compressor.ml).

This has come up before[0], but at the time generic solutions seemed
problematic due to various false positives, or missing features in
spamassassin itself. I'm wondering what the current state is now.

I can do a relatively easy meta-rule for my domain, something like this,
but I'm not sure how well this would work, or if there are better
methods now:

header __LOCAL_FROM_QUOTE_ISUS  From =~ /\".*\@example\.com\"/
header __LOCAL_FROM_CONTAIN_NOTUS   From !~ /<.*\@example\.com/>/
meta TRICKY_FROM((( __LOCAL_FROM_QUOTA_ISUS ) + ( 
__LOCAL_FROM_CONTAIN_NOTUS )) > 1)
describe TRICKY_FROMFrom has example.com in quotes, but not 
in path
score TRICKY_FROM   5



0. https://www.mail-archive.com/users@spamassassin.apache.org/msg100800.html
-- 
micah

1 2 >

1 - 100 of 124 matches

Mail list logo