A few rules to catch current gmail spam

2008-06-01 Thread OliverScott

I have seen a few posts with people complaining about spam from gmail (often
linking to blogspot pages) which no existing rules catch, and have had a
number of these myself. This is only a small fraction of the spam I am
seeing, but it is anoying none-the-less!

NOTE: I am not a particulally good rule writer and there are probably a lot
more elegant ways of doing this! Feel free to make suggestions and
improvements and to use how you will.

The easiest way that I can see to catch these emails is to combine a number
of existing rules and to add a couple of new rules which look for specific
things:

Existing rules used:
FreeMail.pm Plugin
ChickenPox.cf

New Rule 1 - Find all emails which link to a free blog site:

uri FHS_FREEBLOG
/(?:spaces\.msn\.com|blogeasy\.com|easyjournal\.com|multiply\.com|blog-city\.com|blogharbor\.com|bloghi\.com|bloghorn\.com|blogspirit\.com|blogsource\.com|ebloggy\.com|pitas\.com|blogger\.de|blogsome\.com|weblogs\.us|wordpress\.com|wpblogs\.com|blogthing\.com|globbo\.org|theblog\.cc|learnerblogs\.org|uniblogs\.org|edublogs\.org|hrblogs\.org|beblogger\.com|evilsupergenius\.net|blogcafe\.com|blogspot\.com|weblogs\.hu|weblogs\.cz|blogs\.ro|weblogs\.pl|blogs\.fi|blogs\.no|blogs\.dk|blogs\.se|blog\.com|blog\.de|blog\.co\.uk|blog\.ca|freewebs\.com|livejournal\.com|20six\.co\.uk|xanga\.com|aeonity\.com|bloggercrab\.com|upsaid\.com|diaryland\.com|blogs\.ie|modblog\.com|efx2\.com|blogdrive\.com|tblog\.com|blogcult\.com|seo-blog\.com|quickblog\.org|diary-x\.com|blurty\.com|upsaid\.com|bloggercrab\.com|blogghost\.com)/i
describe FHS_FREEBLOG   Contains a link to a free blog.
score FHS_FREEBLOG  0.001

New Rule 2 - Look for a propper html link in the email (i.e. long url and
short description):

rawbody FHS_LINK/\]{20,50}\>[^<]{6,15}\<\/a/i
describe FHS_LINK   Contains a long URL with a short description - a well
written link
score FHS_LINK  0.001


Now consider that people who send messages from a free email address are
very unlikely to go to the trouble of using a properly formatted link in
their email (they will just copy and past the url):

meta FREEMAIL_LINK_BLOG (FREEMAIL_FROM && FHS_LINK && FHS_FREEBLOG)
describe FREEMAIL_LINK_BLOG From a freemail address and includes a well
written link to a blog
score FREEMAIL_LINK_BLOG 2.0


The next thing I noticed was that most of these emails hit various bits of
the chickenpox.cf ruleset so I created a set of meta rules to count how many
of these were hit, and then combined this with the freemail rules:

meta FHS_COUNT_CHICKENPOX_3 (( J_CHICKENPOX_12 + J_CHICKENPOX_13 +
J_CHICKENPOX_14 + J_CHICKENPOX_15 + J_CHICKENPOX_16 + J_CHICKENPOX_17 +
J_CHICKENPOX_18 + J_CHICKENPOX_19 + J_CHICKENPOX_110 + J_CHICKENPOX_111 +
J_CHICKENPOX_21 + J_CHICKENPOX_22 + J_CHICKENPOX_23 + J_CHICKENPOX_24 +
J_CHICKENPOX_25 + J_CHICKENPOX_26 + J_CHICKENPOX_27 + J_CHICKENPOX_28 +
J_CHICKENPOX_29 + J_CHICKENPOX_210 + J_CHICKENPOX_31 + J_CHICKENPOX_32 +
J_CHICKENPOX_33 + J_CHICKENPOX_34 + J_CHICKENPOX_35 + J_CHICKENPOX_36 +
J_CHICKENPOX_37 + J_CHICKENPOX_38 + J_CHICKENPOX_39 + J_CHICKENPOX_41 +
J_CHICKENPOX_42 + J_CHICKENPOX_43 + J_CHICKENPOX_44 + J_CHICKENPOX_45 +
J_CHICKENPOX_46 + J_CHICKENPOX_47 + J_CHICKENPOX_48 + J_CHICKENPOX_51 +
J_CHICKENPOX_52 + J_CHICKENPOX_53 + J_CHICKENPOX_54 + J_CHICKENPOX_55 +
J_CHICKENPOX_56 + J_CHICKENPOX_57 + J_CHICKENPOX_61 + J_CHICKENPOX_62 +
J_CHICKENPOX_63 + J_CHICKENPOX_64 + J_CHICKENPOX_65 + J_CHICKENPOX_66 +
J_CHICKENPOX_71 + J_CHICKENPOX_72 + J_CHICKENPOX_73 + J_CHICKENPOX_74 +
J_CHICKENPOX_75 + J_CHICKENPOX_81 + J_CHICKENPOX_82 + J_CHICKENPOX_83 +
J_CHICKENPOX_84 + J_CHICKENPOX_91 + J_CHICKENPOX_92 + J_CHICKENPOX_93 +
J_CHICKENPOX_101 + J_CHICKENPOX_102 ) > 2)
describe FHS_COUNT_CHICKENPOX_3 Three or more odd character combinations
score FHS_COUNT_CHICKENPOX_30.1

meta FHS_COUNT_CHICKENPOX_5 (( J_CHICKENPOX_12 + J_CHICKENPOX_13 +
J_CHICKENPOX_14 + J_CHICKENPOX_15 + J_CHICKENPOX_16 + J_CHICKENPOX_17 +
J_CHICKENPOX_18 + J_CHICKENPOX_19 + J_CHICKENPOX_110 + J_CHICKENPOX_111 +
J_CHICKENPOX_21 + J_CHICKENPOX_22 + J_CHICKENPOX_23 + J_CHICKENPOX_24 +
J_CHICKENPOX_25 + J_CHICKENPOX_26 + J_CHICKENPOX_27 + J_CHICKENPOX_28 +
J_CHICKENPOX_29 + J_CHICKENPOX_210 + J_CHICKENPOX_31 + J_CHICKENPOX_32 +
J_CHICKENPOX_33 + J_CHICKENPOX_34 + J_CHICKENPOX_35 + J_CHICKENPOX_36 +
J_CHICKENPOX_37 + J_CHICKENPOX_38 + J_CHICKENPOX_39 + J_CHICKENPOX_41 +
J_CHICKENPOX_42 + J_CHICKENPOX_43 + J_CHICKENPOX_44 + J_CHICKENPOX_45 +
J_CHICKENPOX_46 + J_CHICKENPOX_47 + J_CHICKENPOX_48 + J_CHICKENPOX_51 +
J_CHICKENPOX_52 + J_CHICKENPOX_53 + J_CHICKENPOX_54 + J_CHICKENPOX_55 +
J_CHICKENPOX_56 + J_CHICKENPOX_57 + J_CHICKENPOX_61 + J_CHICKENPOX_62 +
J_CHICKENPOX_63 + J_CHICKENPOX_64 + J_CHICKENPOX_65 + J_CHICKENPOX_66 +
J_CHICKENPOX_71 + J_CHICKENPOX_72 + J_CHICKENPOX_73 + J_CHICKENPOX_74 +
J_CHICKENPOX_75 + J_CHICKENPOX_81 + J_CHICKENPOX_82 + J_CHICKENPOX_83 +
J_CHICKENPOX_84 + J_CHICKENPOX_91 + J_CHICKEN

Script to generate whitelist based on outgoing email

2008-02-03 Thread OliverScott

Not sure if this will be of any use to anyone else, of if it can be made to
work with anything other than Exim, but here is the first draft of a script
to generate a whitelist based on outgoing email! I have had it running on a
server (for the last 2 months) handeling 20,000 emails a week for a variety
of end users and as yet it hasn't caused any problems, and has helped to
reduce the chances of false positives...

I got the idea as a lot of desktop antispam solutions will automatically add
the addresses of people you send email to, to a whitelist. Usually this
feature is called somthing like AutoWhiteList (not to be confused with the
spamassassin AWL which does somthing else entirely).

The following script (which I hope comes through sucessfully) looks through
the last 4 weeks of Exim maillogs and can be used to generate a spamassassin
rule file to down score incoming emails (or as part of a shortcircuit rule).
I admit to having very little knowledge of linux utilities and scripts
having only started messing with them a few months ago, so I am sure someone
with better skills than mine will have a good laugh at what I have done, but
the idea is there and though the code is not elegant it does work!

I would appreciate any suggestions or comments you have :D


## The Script - out.sh ##

# Script to create a spamassassin ruleset to down-score emails from
addresses which have previously had email SENT to them.
# This is designed to work with exim logs and will need to be customised to
fit your system!

# This script looks at the current mail log and the ones from the previous
four weeks and is designed to be run once per day (probaly at night).
# NOTE: Email addresses which have repeatedly been sent to over this period
are given a better score than ones which appear in only one log file.

# This script is in no way optimised or designed for use on a production
mail server - it is very much a proof of concept!

# Version 0.1 Alpha - Updated 09-12-2007 (D-M-Y)

# Bugs / ToDo's:
# Currently if a log file does not include any outgoing email then the
generated rule will match EVERY incoming email. Make sure you you don't
schedule it directly after a log-rotate!

# Usage:
# ./out.sh > out.cf


# The process:
# AWK the current email log for lines which relate to outgoing email sent by
local users
# Sort it alphabetically 
# Remove any duplicates
# NOTE: the next few steps can probably be done with one command if you have
been using TR and SED for more than the 10 minutes I have!
# Remove line breaks - replace them with commas
# Remove the final comma
# Replace the commas with |
# Escape the .'s using SED
# Escape the @'s using SED
# Create the text of a spamassassin rule which matches any email addresses
that have been sent to in the mail log file
# Remove line breaks created by AWK


awk '/T=remote_smtp/ && /[Cc]="250 [Oo][Kk]/ && !/F=<>/ {print $5}'
/var/log/exim/mainlog | sort | uniq | tr "\r\n" "," | sed '$s/,$//' | tr ","
"|" | sed 's/[.]/\\./g' |  sed 's/[EMAIL PROTECTED]/\\@/g' | awk 'BEGIN {print 
"header
__MAIL_SENT_TO_0 FROM =~ /("} {print $0} END {print ")/i\n"}' | tr -d "\r\n"
echo
echo describe __MAIL_SENT_TO_0 From address which had been sent to during
the last week

echo

awk '/T=remote_smtp/ && /[Cc]="250 [Oo][Kk]/ && !/F=<>/ {print $5}'
/var/log/exim/mainlog.1 | sort | uniq | tr "\r\n" "," | sed '$s/,$//' | tr
"," "|" | sed 's/[.]/\\./g' |  sed 's/[EMAIL PROTECTED]/\\@/g' | awk 'BEGIN 
{print "header
__MAIL_SENT_TO_1 FROM =~ /("} {print $0} END {print ")/i\n"}' | tr -d "\r\n"
echo
echo describe __MAIL_SENT_TO_1 From address which had been sent to one week
ago

echo

awk '/T=remote_smtp/ && /[Cc]="250 [Oo][Kk]/ && !/F=<>/ {print $5}'
/var/log/exim/mainlog.2 | sort | uniq | tr "\r\n" "," | sed '$s/,$//' | tr
"," "|" | sed 's/[.]/\\./g' |  sed 's/[EMAIL PROTECTED]/\\@/g' | awk 'BEGIN 
{print "header
__MAIL_SENT_TO_2 FROM =~ /("} {print $0} END {print ")/i\n"}' | tr -d "\r\n"
echo
echo describe __MAIL_SENT_TO_2 From address which had been sent to two weeks
ago

echo

awk '/T=remote_smtp/ && /[Cc]="250 [Oo][Kk]/ && !/F=<>/ {print $5}'
/var/log/exim/mainlog.3 | sort | uniq | tr "\r\n" "," | sed '$s/,$//' | tr
"," "|" | sed 's/[.]/\\./g' |  sed 's/[EMAIL PROTECTED]/\\@/g' | awk 'BEGIN 
{print "header
__MAIL_SENT_TO_3 FROM =~ /("} {print $0} END {print ")/i\n"}' | tr -d "\r\n"
echo
echo describe __MAIL_SENT_TO_3 From address which had been sent to three
weeks ago

echo

awk '/T=remote_smtp/ && /[Cc]="250 [Oo][Kk]/ && !/F=<>/ {print $5}'
/var/log/exim/mainlog.4 | sort | uniq | tr "\r\n" "," | sed '$s/,$//' | tr
"," "|" | sed 's/[.]/\\./g' |  sed 's/[EMAIL PROTECTED]/\\@/g' | awk 'BEGIN 
{print "header
__MAIL_SENT_TO_4 FROM =~ /("} {print $0} END {print ")/i\n"}' | tr -d "\r\n"
echo
echo describe __MAIL_SENT_TO_4 From address which had been sent to four
weeks ago

echo
echo

echo meta MAIL_SENT_TO \(\(__MAIL_SENT_TO_0 + __MAIL_SENT_TO_1 +
__MAIL_SENT_TO_2 + __MAIL_SENT_TO_3 + __MAIL_SENT_TO_

Re: Stop tests when score is high

2007-12-20 Thread OliverScott

Not that I am aware of...

The complication with this would be the order in which tests are carrierd
out - you might have a genuine email which hits some good and some bad
tests, and if the bad tests are hit first then you might have a problem!

However it is a feature I would like to see as it could be used in
conjunction with the Short Circuit pluggin.

I am currently using short circuit to improve spam processing speed. I have
set fast tests and rules with a high accuracy to run first (using a low,
negative, priority), and when specific combinations of rules fire which
should never cause false positives, I then break out of further testing and
clasify the email as spam.
-- 
View this message in context: 
http://www.nabble.com/Stop-tests-when-score-is-high-tp14432409p14437413.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



syswrite() to parent failed: Broken pipe at /usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 570

2007-11-05 Thread OliverScott

SpamD seems to die every now and again (every couple of days) and though I
have a script which checks regularly for various key services and restarts
them if they are missing, it is letting a couple of spam through each
time...

The error message I am getting in my maillog when this happens is:

server spamd[9522]: syswrite() to parent failed: Broken pipe at
/usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line
570.

I have installed the logging plugin and will grab a copy of the next message
to cause this to see if that sheds any light on the problem, but I was
wonderng if anyone had seen this problem before?

This is running on a CentOS 4.4 (Red Hat) VPS with Exim 4.67 (not that this
is probably relevant) and is running SpamAssassin 3.2.3 with all the normal
additons (Razor, DCC, iXhash, BotNet, SARE, PDFInfo, ClamAV Plugin, Extra
DNSBLs, and a few custom ShortCircuits).

Thanks!
-- 
View this message in context: 
http://www.nabble.com/syswrite%28%29-to-parent-failed%3A-Broken-pipe-at--usr-lib-perl5-site_perl-5.8.5-Mail-SpamAssassin-SpamdForkScaling.pm-line-570-tf4751769.html#a13587308
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: How to block the bat!

2007-10-23 Thread OliverScott

If you want to reduce the spam you get which claims to be from the bat then
do the following:

Create a rule which looks for the bat as a header with a 0.001 score.

Create a meta rule which looks for email which is caught by the above rule
AND hits Bayes_99 AND/OR (you choose based on how worried you are about FPs)
which hits BOTNET. Give this meta rule a score of 5 or more.

Thats how I would handle it (if my current config wern't already catching
all these emails).
-- 
View this message in context: 
http://www.nabble.com/How-to-block-the-bat%21-tf4644470.html#a13362545
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Manual sorting based on score count

2007-09-04 Thread OliverScott

You already can - try this in your local.cf:

rewrite_header Subject SPAM [_STARS(X)_]

This will give you somthing which looks like:

SPAM [X] Some Dodgy Subject

You can also put in the actual numeric score (rather than a number of X's
which equals the whole number part of the score) but I find it easier to
create rules in email clients which count whole numbers of X's.

Note: You can use any other character rather than X if you want.

To include the actual score use:

rewrite_header Subject *SPAM* (_SCORE_)

This will give you somthing which looks like:

*SPAM* (9.7) Some Dodgy Subject

Hope this helps!



Jesse Molina wrote:
> 
> 
> Hi
> 
> I admin my personal mail system with SpamAssassin.  I use maildrop as my 
> MDA to process mail through SpamAssassin and then deliver it to the 
> proper new-spam folder based on the spam's score.
> 
> However, I then need to manually go through my new-spam folder from time 
> to time and find the false-positives and train the Bayes system as 
> appropriate.
> 
> I use Seamonkey (Mozilla) and mutt as my MUAs.  I'm usually using 
> Seamonkey when I'm doing my manual sorting and processing of my new-spam 
> folder.
> 
> Today I was thinking about adding a feature to rewrite the Subject field 
> of spam-tagged messages with the numerical value of the score.  For
> example;
> 
> Subject: *SPAM:Score=24* old-subject-goes-here
> 
> or maybe
> 
> Subject: *SPAM:24* old-subject-goes-here
> 
> This would make sorting of my new-spam folder easy, based on the 
> alphabetical/numerical ordering of the subjects.  Lower scored mails are 
> more likely to be false positives, so I can go through them first and 
> then forget about anything with a score over 15 or 20.
> 
> This is pretty easy to do, but I wanted to ask if anyone else is doing 
> this, and if they have any superior methodologies that they have
> discovered.
> 
> Comments would be appreciated
> 
> 
> 
> -- 
> # Jesse Molina
> # Mail = [EMAIL PROTECTED]
> # Page = [EMAIL PROTECTED]
> # Cell = 1.602.323.7608
> # Web  = http://www.opendreams.net/jesse/
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Manual-sorting-based-on-score-count-tf4376119.html#a12477936
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: False negative

2007-08-27 Thread OliverScott

You need to either get him to change the way he sends his emails or adjust
your scores!

If he is sending directly from a dynamic IP address then he will be blocked
by a lot of peoples filters - for instance there is no chance of his emails
being accepted by AOL!

The way round this is for him to relay through his ISPs outgoing mail server
if at all possible. i.e. put smtp.ispname.com (or somthing like that) in the
outgoing server address of his email client.

If you want to accept emails from people with a similar setup to his without
adding them manually to a whitelist, then you will have to reduce the scores
for the rules which fire on these mails.

Edit your local.cf file (probably in /etc/mail/spamassassin) to include
somthing like:
score FH_HOST_ALMOST_IP 1.0 
score FH_HOST_EQ_DYNAMICIP 1.0 
score RCVD_IN_SORBS_DUL 0.5 

This will still help to catch some spam (though is has reduced the amount
you will catch) but will hopefully be enough to let emails like this through
as long as they don't hit any other rules.

I would suggest NOT using the BOTNET pluggin as it will probably make the
problem worse!
-- 
View this message in context: 
http://www.nabble.com/False-negative-tf4335349.html#a12347708
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Some thoughts on Baysian Setup...

2007-08-27 Thread OliverScott

Site Wide Bayes or Per User Bayes?

This is somthing I have been thinking about and thought I would share to see
what other people think...

Site wide bayes has one database. Per User bayes has one per user or domain
(depending on how your server is configured). For example if you have 40
users with a 10Mb bayes database each then you either have to read and write
these to and from disk when an email comes in, or load all 400Mb of data
into memory.

1. Most users don't know how, arn't allowed, or can't be bothered to train
Bayes. In most cases spamassassin is left to auto-train bayes.

2. Most people would consider the same emails to be SPAM. 90% of what I
think is spam would also be what you think is spam, with only a small
percentage of emails that we disagree on.

3. The emails which we would disagree on would probably be newsletters and
advertising emails from legitimate companies. Unwanted newsletters and
advertising emails which people have deliberately (possibiliy due to
stupidity) signed up to should not be trained as SPAM, but should be
manually blacklisted if necessary.

4. Site wide bayes saves disk space and more importantly it saves
significantly on disk IO or memory requirements.

5. A larger database leads to more accurate baysian identification - I am
guessing this is right?

Do you agree or disagree with the five above statements?

Based on the five above statements I would suggest that:
Site wide bayes is as good as if not slightly better (due to a potentially
larger single database) than per user bayes when it comes to identifying
SPAM emails.

1. What I think of as HAM emails could be widely different from what you
think of as HAM emails - if I were to sort your inbox by hand (without
knowing you personally) I would probably delete some good emails by mistake
while getting rid of the spam.

2. If a server has one customer who is a plumber and one who is an artist,
site wide bayes would learn that emails containing the words pipes or canvas
are good. The plumber will get emails with the word canvas in them tagged as
bayes_00 and vice versa.

3. If per user bayes is chosen then bayes_00 will only fire on emails
containing words which have occurred in emails which YOU have received in
the past and which scored low enough to be autolearned. 

4. If a HAM email is misclasified as SPAM then users are more likely to
report this to their admin or to train the filter themselves, than for SPAM
emails which are not tagged. People will ignore a few spam slipping through
but not false positives!

Do you agree or disagree with the four above statements?

Based on the four above statements I would suggest that:
Per User bayes is better than Site Wide bayes when it comes to correctly
identifying HAM emails.


If my various assumptions are correct then perhapse there should be a third
type of bayes to choose from in spamassassin? Namely one where:
SPAM tokens are stored on a server wide basis - can be a LARGE database if
this helps
HAM tokens are stored on a per user basis - probably only needs a 1-2Mb file
per user.

Any comments?

PS. I am not up to coding anything like this myself so don't bother
suggesting that I try it and report back!
-- 
View this message in context: 
http://www.nabble.com/Some-thoughts-on-Baysian-Setup...-tf4335489.html#a12347630
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



How to query the AWL at an earlier stage for Short Circuit?

2007-08-26 Thread OliverScott

I am playing with the Short Circuit plugin to speed up scanning (by skipping
Network Tests on obviously good emails) and wanted to be able to query the
AWL as part of this as I don't want to Short Circuit on BAYES_00 alone.

i.e.

Short Circuit as HAM if both BAYES_00 & AWL fire.

I tried this:

priority USER_IN_WHITELIST -1000
priority ALL_TRUSTED-950
priority BAYES_00   -400

shortcircuit USER_IN_WHITELIST   on
shortcircuit ALL_TRUSTEDon


# Add a high priority rule to check if the sender is in the AWL
header __MY_AWL eval:check_from_in_auto_whitelist()
describe __MY_AWL   Sender has been seen before.
priority __MY_AWL-300

meta MY_HAM_SC  (( BAYES_00 + __MY_AWL ) > 1)
describe MY_HAM_SC  Clearly not SPAM.
priority MY_HAM_SC  -200
tflags MY_HAM_SCnice
score MY_HAM_SC -50
shortcircuit MY_HAM_SC  on


However this does not work as messages which get BAYES_00 and AWL, do not
get Short Circuited...

I presume that this is because the AWL which normally runs at a priority of
1000 can't be accessed at an earlier stage?

I still want the AWL to do its normal job once the other scoring has
finished, so don't want to make its priority less than 1000, but was hoping
that there was a way to query its information earlier in the SpamAssasssin
process.

Any ideas?

-- 
View this message in context: 
http://www.nabble.com/How-to-query-the-AWL-at-an-earlier-stage-for-Short-Circuit--tf4332696.html#a12339661
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Problem with clamav plugin

2007-07-24 Thread OliverScott

You need to set a high priority for the meta rules as otherwise they are
evaluated BEFORE the ClamAV plugin is used (I think?). I am not an expert in
how SA works, but I eventually came up with the following solution (for
using several different 3rd party clamav signatures):

This is my clamav.cf file:

loadplugin ClamAV clamav.pm 
full CLAMAV eval:check_clamav() 
describe CLAMAV Clam AntiVirus detected something... 
score CLAMAV 0.001 

# Look for specific types of ClamAV detections 
header __CLAMAV_PHISH X-Spam-Virus =~ /Yes.{1,20}Phishing/i 
header __CLAMAV_SANE X-Spam-Virus =~ /Yes.{1,20}Sanesecurity/i 
header __CLAMAV_MBL X-Spam-Virus =~ /Yes.{1,20}MBL/ 
header __CLAMAV_MSRBL X-Spam-Virus =~ /Yes.{1,20}MSRBL/ 

# Give the above rules a very late priority so that they can see the output 
# of previous rules - otherwise they don't work! Not sure what the correct
# priority should be but this seems to work...
priority __CLAMAV_PHISH  
priority __CLAMAV_SANE  
priority __CLAMAV_MBL  
priority __CLAMAV_MSRBL  

# Work out what ClamAV detected and score accordingly 
meta CLAMAV_VIRUS (CLAMAV && !__CLAMAV_PHISH && !__CLAMAV_SANE &&
!__CLAMAV_MBL && !__CLAMAV_MSRBL) 
describe CLAMAV_VIRUS Virus found by ClamAV default signatures 
score CLAMAV_VIRUS 20.0 

meta CLAMAV_PHISH (CLAMAV && __CLAMAV_PHISH && !__CLAMAV_SANE) 
describe CLAMAV_PHISH Phishing email found by ClamAV default signatures 
score CLAMAV_PHISH 10.0 

meta CLAMAV_SANE (CLAMAV && __CLAMAV_SANE) 
describe CLAMAV_SANE SPAM found by ClamAV SaneSecurity signatures 
score CLAMAV_SANE 7.5 

meta CLAMAV_MBL (CLAMAV && __CLAMAV_MBL) 
describe CLAMAV_MBL Malware found by ClamAV MBL signatures 
score CLAMAV_MBL 7.5 

meta CLAMAV_MSRBL (CLAMAV && __CLAMAV_MSRBL) 
describe CLAMAV_MSRBL SPAM found by ClamAV MRSBL signatures 
score CLAMAV_MSRBL 2.0 



In your case you could fix what you have done (which looks to be taken from
one of my previous messages while trying to get this to work myself?) by
making it:

header __MY_CLAMAV X-Spam-Virus =~ /Yes/i
priorty __MY_CLAMAV 
header __MY_CLAMAV_SANE X-Spam-Virus =~ /Yes.{1,50}Sanesecurity/i
priorty __MY_CLAMAV_SANE 
meta MY_CLAMAV_SANE (__MY_CLAMAV && __MY_CLAMAV_SANE) 
score MY_CLAMAV_SANE 5 


Hope this helps!
-- 
View this message in context: 
http://www.nabble.com/Problem-with-clamav-plugin-tf4135813.html#a11763227
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: My bash script to upload PDFinfo daily, safely

2007-07-22 Thread OliverScott

I have found SaneSecurity definitions to be VERY good - they hit about 60% of
my SPAM which is incredible given that they only match exact results (they
are not fuzzy). However this high percentage may be beacuse I am based in
the UK as is the author of the sanesecurity definitions. Also they tend to
hit already high scoring spam so they arn't a miracle spam fighting measure
though they are good.

My biggest concern was over possible false positives given that there is
only one person working on these definitions unlike the official ClamAV
signatures...

However I have yet to have any problems with them in the month that I have
been using them.

There are also two other sets of ClamAV signatures which I am now testing
(though these are not as good IMHO):

http://www.malware.com.br/ (various formats including ClamAV)
http://www.msrbl.com/site/ (ClamAV as well as RBLs)

As a solution to my own concerns over false positives I have changed from
virus scanning at SMTP time and have moved to using the ClamAV SpamAssassin
plugin:

http://wiki.apache.org/spamassassin/ClamAVPlugin

Rather than using the standard clamav.cf I have written my own which gives
different scores depending on what ClamAV signature found somthing:

loadplugin ClamAV clamav.pm
full CLAMAV eval:check_clamav()
describe CLAMAV Clam AntiVirus detected something...
score CLAMAV 0.001

# Look for specific types of ClamAV detections
header __CLAMAV_PHISH X-Spam-Virus =~ /Yes.{1,20}Phishing/i
header __CLAMAV_SANE X-Spam-Virus =~ /Yes.{1,20}Sanesecurity/i
header __CLAMAV_MBL X-Spam-Virus =~ /Yes.{1,20}MBL/
header __CLAMAV_MSRBL X-Spam-Virus =~ /Yes.{1,20}MSRBL/

# Give the above rules a very late priority so that they can see the output
# of previous rules - otherwise they don't work!
priority __CLAMAV_PHISH 
priority __CLAMAV_SANE 
priority __CLAMAV_MBL 
priority __CLAMAV_MSRBL 

# Work out what ClamAV detected and score accordingly
meta CLAMAV_VIRUS (CLAMAV && !__CLAMAV_PHISH && !__CLAMAV_SANE &&
!__CLAMAV_MBL && !__CLAMAV_MSRBL)
describe CLAMAV_VIRUS Virus found by ClamAV default signatures
score CLAMAV_VIRUS 20.0

meta CLAMAV_PHISH (CLAMAV && __CLAMAV_PHISH && !__CLAMAV_SANE)
describe CLAMAV_PHISH Phishing email found by ClamAV default signatures
score CLAMAV_PHISH 10.0

meta CLAMAV_SANE (CLAMAV && __CLAMAV_SANE)
describe CLAMAV_SANE SPAM found by ClamAV SaneSecurity signatures
score CLAMAV_SANE 7.5

meta CLAMAV_MBL (CLAMAV && __CLAMAV_MBL)
describe CLAMAV_MBL Malware found by ClamAV MBL signatures
score CLAMAV_MBL 7.5

meta CLAMAV_MSRBL (CLAMAV && __CLAMAV_MSRBL)
describe CLAMAV_MSRBL SPAM found by ClamAV MRSBL signatures
score CLAMAV_MSRBL 2.0


Hope this is of some help to someone...
-- 
View this message in context: 
http://www.nabble.com/My-bash-script-to-upload-PDFinfo-daily%2C-safely-tf4115144.html#a11732078
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: is there a whitelist rhswl available

2007-07-18 Thread OliverScott

http://www.dnswl.org/
http://wiki.ctyme.com/index.php/Spam_DNS_Lists

Both work well IMHO



Ramprasad wrote:
> 
> There are quite a few domain you can trust not to send spam. 
> For example the airlines, the banks , and a lot others like
> spamassassin.apache.org :-) 
> 
> If mails from these domains gets an SPF/DK pass we can simply pass the
> mails. Today I manually maintain a list of whitelist_from_auth 
> 
> Is there a global DNS WL available somewhere. So that I dont have to
> keep tracking myself for maintaining which new bank has put up SPF
> records 
> 
> 
> Thanks
> Ram
> 
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/is-there-a-whitelist-rhswl-available-tf4102536.html#a11668610
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: ClamAV in SA( was: SaneSecurity)

2007-07-02 Thread OliverScott

Is the following easy to do? I am a bt of a Linux novice I'm afraid...

I have tried discarding at SMTP with ClamAV and Exim, and scanning in SA
using the ClamAV plugin, but wasn't 100% happy with either solution (for the
reasons you give).

Any pointers would be greatfully accepted!

>We do, an I think they are. Currently I run two instances of 
>clamd in our mail gateway.

>One instance has only the official ClamAV databases with phishing 
>signatures turned off. This instance is used by MIMEDefang (a 
>milter) for discarding infected mail.

>The second instance has the official databases with phishing 
>signatures (and some other stuff) turned on as well as the 
>SaneSecurity*, MSRBL* and Malware* signatures. This instance is 
>used by SpamAssassin for scoring mail.
-- 
View this message in context: 
http://www.nabble.com/SaneSecurity-tf3989268.html#a11400255
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Writing a rule to access SA ClamAV Plugin Header

2007-07-01 Thread OliverScott

There is a SpamAssassin plugin which checks messages with ClamAV, which adds
the following header to emails it processes:

X-Spam-Virus: Yes ($VirusName)

http://wiki.apache.org/spamassassin/ClamAVPlugin

By default you can set a score in its clamav.cf file:

score CLAMAV 10

I am currently testing a 3rd party set of ClamAV definitions from a website
called www.sanesecurity.co.uk which look to be very effective against some
phishing and image spam emails. When it fires on an email the headers the
ClamAV plugin adds are as follows:

X-Spam-Virus: Yes ($Name.Sanesecurity)

What I would like to do would be to score the ClamAV detection differently
depending on whether it was detected by the ClamAV default signatures
(virus) or the Sanesecurity signatures (spam). I have tried adding the
following to local.cf but it doesn't seem to be working:

header __MY_CLAMAV X-Spam-Virus =~ /Yes/i
header __MY_CLAMAV_SANE X-Spam-Virus =~ /Yes.{1,50}Sanesecurity/i
meta MY_CLAMAV (__MY_CLAMAV && !__MY_CLAMAV_SANE)
meta MY_CLAMAV_SANE (__MY_CLAMAV && __MY_CLAMAV_SANE)
score MY_CLAMAV 10
score MY_CLAMAV_SANE 5

Any suggestions?
-- 
View this message in context: 
http://www.nabble.com/Writing-a-rule-to-access-SA-ClamAV-Plugin-Header-tf4007944.html#a11382177
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: exposing rules

2007-06-26 Thread OliverScott

Assuming that you have managed to get SA to add headers to messages which is
thinks are spam, and are looking to add a header to ALL messages so you can
see what rules are firing on your HAM, then you can do the following. This
may not be what you are after, but may be of some use!

edit your local.cf file and add:

add_header all Status _YESNO_, score=_SCORE_ required=_REQD_
tests=_TESTSSCORES(,)_ _DCCR_ _PYZOR_ _RBL_ autolearn=_AUTOLEARN_
languages=_LANGUAGES_

Note: this should all be added as ONE long line!
-- 
View this message in context: 
http://www.nabble.com/exposing-rules-tf3979477.html#a11314268
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Botnet Score

2007-06-24 Thread OliverScott

Though BotNet is VERY effective in catching SPAM, the default score of 5 is
way too high IMHO.

With a well trained BAYES, using a selected list of RBLs and URIBLs for
scoring, the SARE rules, and some custom rules of my own I am confident that
I am catching well over 90% of the SPAM hitting my server (about 5000 emails
received a week), with almost no false positives.

Based on this I set BotNet to score 0.001 for all its rules (so as not to
confuse the issue), and after a week examined its effectiveness using
sa-stats.pl...

If detected 91.7% of SPAM which is FANTASTIC!

But is also fired on 9.6% of my HAM emails which is not so good :(

Normally if a rule gets this higher FP then I would discard it, but given
the amount of SPAM is catches I have left it running but set to only add 1
to the scores of the emails it detects (as this will not be enough to
greatly affect the scores of the false positive ham emails it hits) and in
this fashon it helps to up-score my SPAM enough to push it over my BAYES
training threshold and my Delete threshold.

One other benefit of BotNet is that it includes some rules which can be used
to down-score some genuine commerical emails and emails sent through an ISPs
mail servers.

My scores for BotNet are as follows:
score BOTNET 1.000
score BOTNET_CLIENT 0.100
score BOTNET_CLIENTWORDS 0.100
score BOTNET_IPINHOSTNAME 0.500
score BOTNET_SOHO -0.100
score BOTNET_SERVERWORDS -0.500

Other things you should look at are upgrading to SA 3.2.1 as this includes
URIBL_BLACK by default (another very effective rule), and possibly using the
SAGREY plugin (which uses the auto white list feature to see if an email is
the first one you have had from an address, and in this case if it looks to
be SPAM it adds a bit more to its score!).

Obviously your mileage may vary!

Oliver


Matt-123 wrote:
> 
> I have added botnet to my Spamassassin install.  It seems to have
> helped quite a bit so far.  I am just wandering about the 5 points it
> gives for a hit.  Is that too much?  Does it have alot of false
> positives or not?
> 
> Matt
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Botnet-Score-tf3971206.html#a11276655
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Changes to SURBL in SA 3.2.1?

2007-06-23 Thread OliverScott

EDIT: My mistake - the URIBLs are listed in two different places in the 3.2.1
rules table! However URIBL_BLACK does seem to be listed twice with different
names and scores...

I have just been picking through some of the changes in 3.2.1 (having just
installed it) to see what impact this will have on my custom rules and RBLs
etc and have noticed somthing strange!

3.2.0 checked the following URIBLs
http://spamassassin.apache.org/tests_3_0_x.html:
URIBL_SBL
URIBL_SC_SURBL
URIBL_WS_SURBL  
URIBL_PH_SURBL 
URIBL_OB_SURBL
URIBL_AB_SURBL

3.2.1 checks the following URIBLs
http://spamassassin.apache.org/tests_3_2_x.html:
URIBL_COMPLETEWHOIS
URIBL_RHS_ABUSE 
URIBL_RHS_AHBL  
URIBL_RHS_BOGUSMX
URIBL_RHS_DOB  
URIBL_RHS_DSN 
URIBL_RHS_POST
URIBL_RHS_TLD_WHOIS 
URIBL_RHS_URIBL_BLACK
URIBL_RHS_URIBL_GREY
URIBL_RHS_WHOIS
URIBL_XS_SURBL  (URL listed in XS SURBL - TEsting)

My question is: Does URIBL_XS_SURBL replace all the previous SURBL black
lists? Is it in effect multi.surbl.org? I can't find any details on XS_SURBL
on the surbl.org website...

If it is a multiple check then this will reduce the scoring of some SPAM as
it is scored at 1, when some of the old SURBL rules where scored at 2, 3,
and even 4! Not a problem IMHO as SA now includes several other good URI BLs
including the excelent URIBL_BLACK.


-- 
View this message in context: 
http://www.nabble.com/Changes-to-SURBL-in-SA-3.2.1--tf3969802.html#a11267936
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.