from:"steve"

Requesting help, sa-update, cron, gpg, unsafe ownership on homedir

2024-07-12 Thread Steve Charmer

I have a cron job running as root, which calls sa-update

it warns about unsafe ownership


gpg: WARNING: unsafe ownership on homedir
`/var/lib/spamassassin/sa-update-keys'


this is my current ownership

ls -la /var/lib/spamassassin/sa-update-keys
total 16
drwx-- 2 spamd root  4096 Jun 20  2017 .
drwxr-xr-x 7 spamd spamd 4096 Nov 22  2018 ..
-rwx-- 1 spamd root  2783 Jun 20  2017 pubring.gpg
-rwx-- 1 spamd root 0 Jun 20  2017 pubring.gpg~
-rwx-- 1 spamd root 0 Jun 20  2017 secring.gpg
-rwx-- 1 spamd root  1200 Jun 20  2017 trustdb.gpg



I've read that the ownership should be root, so does having the owner =
spamd, and the group = root, causing that warning? I thought having group =
root would fix any ownership issues. I cannot recall now, why I set owner
to spamd. maybe spamd could not read the gpg keys when trying an update
before?


Should I chown the folders and files to be root : root ?

TYPO USER_IN_DKIM_WHITELSIT in 60_whitelist_dkim.cf

2021-12-01 Thread Steve Charmer

Hi, I am running version 3.4.2

/usr/bin/spamassassin -V
SpamAssassin version 3.4.2
  running on Perl version 5.22.1
spamd --version
SpamAssassin Server version 3.4.2
  running on Perl 5.22.1
  with SSL support (IO::Socket::SSL 2.024)
  with zlib support (Compress::Zlib 2.068)
which spamd
/usr/sbin/spamd

==
SA was originally installed using apt-get (Ubuntu-16)

==
/var/lib/spamassassin/3.004002/updates_spamassassin_org/60_whitelist_dkim.cf

Line 47
http://svn.apache.org/viewvc/spamassassin/trunk/rules/60_whitelist_dkim.cf?revision=1892060=markup#l47

HAS TYPO  "WHITELSIT"
reuse USER_IN_DKIM_WHITELSIT

which causes warning during startup & lint

spamd[17043]: config: warning: no description set for USER_IN_DKIM_WHITELSIT

I don't have ability to comment nor modify on SVN
so someone with this ability please fix

thank you

Re: How do I search and capture text for use in a rule?

2021-05-07 Thread Steve Dondley


On 2021-05-07 10:33 AM, Henrik K wrote:

On Fri, May 07, 2021 at 10:19:49AM -0400, Steve Dondley wrote:
I want to extract the first part of an email address from the 
"Delivered-To"

header and use it witin a custom rule.

Example pseudo code:

my ($first_part) = $email_file =~ /^Deliver-To: (.*)/;

body __LOCAL_AWKWARD_INTRO /hi $first_part/i


How can I do this in my .cf file?


With a silly kludge, a full rule that matches the complete raw email 
with a

single regex.  Example in stock rules:

full __FROM_NAME_IN_MSG 
/^From:\s+([^<]\S+\s\S+)\s(?=.{1,2048}^\1\r?$)/sm


So something like (untested)

full __LOCAL_AWKWARD_INTRO
/^Delivered-To:\s+<([^@>]+)(?=.{1,2048}\bHi\s+\1\b)/sm



Thanks. I don't quite understand the {1,2048} bit. That looks like a 
look ahead assertion up to 2048 characters? What is magical about 2048? 
What if the "Delivered-To" header is more than 2048 characters away from 
the salutation, which doesn't seem unlikely.

How do I search and capture text for use in a rule?

2021-05-07 Thread Steve Dondley

I want to extract the first part of an email address from the 
"Delivered-To" header and use it witin a custom rule.


Example pseudo code:

my ($first_part) = $email_file =~ /^Deliver-To: (.*)/;

body __LOCAL_AWKWARD_INTRO /hi $first_part/i


How can I do this in my .cf file?

Re: More fake order spam

2021-04-27 Thread Steve Dondley


On 2021-04-27 03:03 PM, Dave Wreski wrote:
Invalid List-ID. You can then use that with other weirdness in a 
meta.
header    __LIST_ID_DOMAIN_IN_BRACKETS List-id =~ 
/<([\w-]+)(\.[\w-]+)+>/
meta   LIST_ID_IMPROPER_FORMAT __HAS_LIST_ID && 
!__LIST_ID_DOMAIN_IN_BRACKETS

score  LIST_ID_IMPROPER_FORMAT 0.001
describe LIST_ID_IMPROPER_FORMAT List-id has improper format


You lost me here. The spam has this:

List-Id: MzY3NDAxMi01Nzg2LTU= 



That's not legit? It's in brackets.


It's matching on the text before the brackets.


I meant to say that it's not matching the __LIST_ID_DOMAIN_IN_BRACKETS
because of the text before the brackets, so the rule
matches/triggered.


OK, gotcha. But now I gotta ask: I see the host tacked onto the random 
bit of text in the brackets, but why is it significant that the part 
outside the brackets doesn't exactly match the part inside? How does 
that let us know the email is bogus?

Re: More fake order spam

2021-04-27 Thread Steve Dondley


On 2021-04-27 02:23 PM, Reindl Harald wrote:

Am 27.04.21 um 19:57 schrieb Steve Dondley:

On 2021-04-27 01:19 PM, Dave Wreski wrote:

Investigate adding the SEM_FRESH rules - this domain was created less
than five days ago.
https://spameatingmonkey.com/services


OK, how do I get those rules installed?


why don't you just click on the link? there is a sample for copy
monkeys and how local .cf files are working is supposed to know by
someone running a public mailserver


I did. That's why I wrote: "I don't see anything similar for SEM rules. 
I see the page you linked to says to drop this into the config:"

Re: More fake order spam

2021-04-27 Thread Steve Dondley


On 2021-04-27 01:19 PM, Dave Wreski wrote:

-2.5 RCVD_IN_HOSTKARMA_W    RBL: Sender listed in HOSTKARMA-WHITE
  [185.41.28.7 listed in 
hostkarma.junkemailfilter.com]


We've reduced this score to -1 locally.


-1.0 BAYES_00   BODY: Bayes spam probability is 0 to 1%


Needs to be trained, obviously. Bayes is best for this body content.

Looks like it's coming from some kind of bulk mail service which is 
whitelisted. Even after training with bayes, it will still be a false 
negative.


Any ideas on the best way to tackle these kinds of fake order spam?


Investigate adding the SEM_FRESH rules - this domain was created less
than five days ago.
https://spameatingmonkey.com/services


OK, how do I get those rules installed? I've only installed KAM rules 
using a channel. I don't see anything similar for SEM rules. I see the 
page you linked to says to drop this into the config:


# SEM-FRESH
urirhssub SEM_FRESH fresh.spameatingmonkey.net. A 2
body SEM_FRESH eval:check_uridnsbl('SEM_FRESH')
describe SEM_FRESH Contains a domain registered less than 5 days ago
tflags SEM_FRESH net
score SEM_FRESH 0.5

I've never seen anything like this before. Looks like this is the 
documentation for that: 
https://spamassassin.apache.org/full/3.2.x/doc/Mail_SpamAssassin_Plugin_URIDNSBL.html


Should I be adding other services besides this one for urihssub lookups?



Invalid List-ID. You can then use that with other weirdness in a meta.
header__LIST_ID_DOMAIN_IN_BRACKETS List-id =~ 
/<([\w-]+)(\.[\w-]+)+>/
meta   LIST_ID_IMPROPER_FORMAT __HAS_LIST_ID && 
!__LIST_ID_DOMAIN_IN_BRACKETS

score  LIST_ID_IMPROPER_FORMAT 0.001
describe LIST_ID_IMPROPER_FORMAT List-id has improper format


You lost me here. The spam has this:

List-Id: MzY3NDAxMi01Nzg2LTU= 

That's not legit? It's in brackets.



Investigate configuring dcc. We also created a meta that matches DCC 
and URIBLs.


Yes, on my todo list.



I believe the new Esp module that works to identify bad sendgrid
accounts also has support for sendinblue accounts, but to what extent?
X-Mailer: Sendinblue


To start, I wrote this rule that I think will probably work well because 
it doesn't make sense for any order information is going to come from a 
mailing list.


# fake order spam
header__LOCAL_FAKE_ORDER_SUBJ   Subject =~ /your.order/i
header__LOCAL_FAKE_ORDER_1   X-Mailer =~ /Sendinblue/i
header__LOCAL_FAKE_ORDER_2   List-Id =~ /./

meta  LOCAL_FAKE_ORDER  _LOCAL_FAKE_ORDER_SUBJ + (__LOCAL_FAKE_ORDER_2 + 
__LOCAL_FAKE_ORDER_3 >= 1)

score LOCAL_FAKE_ORDER 3.0





I believe later versions of SA also have more geolocation support - do
you have a need to receive mail from France?
$ whois 185.41.28.7
...
route:  185.41.28.0/22
descr:  SENDINBLUE-185-41-28-0-22
origin: AS200484

Regards,
Dave

Re: More fake order spam

2021-04-27 Thread Steve Dondley


On 2021-04-27 01:12 PM, Greg Troxel wrote:

As always, if you have a problem stemming from a dns-based or similar
reputation list, you need to report problems to those lists.

If you aren't running greylisting with aggressive delays for SBL/XBL 
and

moderate for dialup, do that too.


What does "aggressive delays for SBL/XBL and moderate for dialup" mean, 
exactly? Do you mean greylist long enough to give the blocklists time to 
label the spam as spam?


And what does "moderate for dialup" mean?

More fake order spam

2021-04-27 Thread Steve Dondley


Got this: https://pastebin.com/Gfz951dh

Spam report:

Content analysis details:   (-2.3 points, 5.0 required)

 pts rule name  description
 -- 
--

-2.5 RCVD_IN_HOSTKARMA_WRBL: Sender listed in HOSTKARMA-WHITE
 [185.41.28.7 listed in 
hostkarma.junkemailfilter.com]

-1.0 BAYES_00   BODY: Bayes spam probability is 0 to 1%
[score: 0.]
-0.0 SPF_HELO_PASS  SPF: HELO matches SPF record
 0.2 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level
mail domains are different
-0.0 SPF_PASS   SPF: sender matches SPF record
 0.1 HTML_MESSAGE   BODY: HTML included in message
-0.1 DKIM_VALID Message has at least one valid DKIM or DK 
signature
 0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not 
necessarily

valid
-0.1 DKIM_VALID_AU  Message has a valid DKIM or DK signature 
from

author\'s domain
-1.0 MAILING_LIST_MULTI Multiple indicators imply a widely-seen list
manager
 2.0 LOCAL_SPAM_TLD Domain originates a lot of spam


Looks like it's coming from some kind of bulk mail service which is 
whitelisted. Even after training with bayes, it will still be a false 
negative.


Any ideas on the best way to tackle these kinds of fake order spam?

Re: Getting "config: registryboundaries: no tlds defined, need to run sa-update" message when running mass-check

2021-04-25 Thread Steve Dondley

On 2021-04-25 01:47 PM, Henrik K wrote:

On Sun, Apr 25, 2021 at 01:28:31PM -0400, Steve Dondley wrote:

> mass-check -c parameter expects to find every config file in that single
> directory.  Now it's missing spamassassin updates and specifically
> 20_aux_tlds.cf from there.  You could copy it to /etc/spamassassin
> temporarily, but I'd rather make a completely separate directory that
> should
> include only the relevant *.pre and *.cf files you need for the scan.

OK, thanks. So I created a directory: /root/spam_rules

I copied over every .cf and .pre file from /etc/spamassassin into that 
dir
as well as every .cf and .pre file inside 
/var/lib/spamassassin/3.004004

Don't blindly copy all .cf files from /etc/spamassassin, there's no 
point

using AWL or bayes etc from that config.

OK. I'm setting up a test machine to duplicate a live machine. Not sure 
if that makes a difference or not.

I ran mass-check with "-c=~/root/spam_rules" and now get a ton of 
these

errors:

config: configuration file "/root/spam_rules/20_advance_fee.cf" 
requires
version 3.004004 of SpamAssassin, but this is code version 3.004006. 
Maybe
you need to use the -C switch, or remove the old config files? 
Skipping this

file at
/root/spamassassin-3.4/masses/../lib/Mail/SpamAssassin/Conf/Parser.pm 
line

414.

svn checkout http://svn.apache.org/repos/asf/spamassassin/trunk
spamassassin-trunk

I have the trunk downloaded via svn, but I have no idea how to find the 
revision for 3.4.4 and roll back to it.

I ended up just downloading the 3.4.4 version from metacpan. After 
downloading and using this version, the errors have gone away.

Re: Getting "config: registryboundaries: no tlds defined, need to run sa-update" message when running mass-check

2021-04-25 Thread Steve Dondley






spamassassin -V reports: "SpamAssassin version 3.4.4"

I imagine I have to checkout an older 3.4.4 point version from SVN and
use the mass-check command from that. It's been ages since I've used
SVN.

How can I get to the older version via SVN?


I solved this by downloading version 3.4.4 of SA from metacpan and then 
dropping the masses/ dir with the mass-check tool from SVN into the 
3.4.4 version.

Re: Getting "config: registryboundaries: no tlds defined, need to run sa-update" message when running mass-check

2021-04-25 Thread Steve Dondley




> On Apr 25, 2021, at 1:31 PM, Axb  wrote:
> 
> What are you trying to do?
> run masscheck for your rules or for the SA project?

I’m experimenting with writing my own rules. My machines are using SA 3.4.4 so 
I want to use the 3.4.4 rules.

Re: Getting "config: registryboundaries: no tlds defined, need to run sa-update" message when running mass-check

2021-04-25 Thread Steve Dondley




mass-check -c parameter expects to find every config file in that 
single

directory.  Now it's missing spamassassin updates and specifically
20_aux_tlds.cf from there.  You could copy it to /etc/spamassassin
temporarily, but I'd rather make a completely separate directory that 
should

include only the relevant *.pre and *.cf files you need for the scan.


OK, thanks. So I created a directory: /root/spam_rules

I copied over every .cf and .pre file from /etc/spamassassin into that 
dir as well as every .cf and .pre file inside 
/var/lib/spamassassin/3.004004


I ran mass-check with "-c=~/root/spam_rules" and now get a ton of these 
errors:



config: configuration file "/root/spam_rules/20_advance_fee.cf" requires 
version 3.004004 of SpamAssassin, but this is code version 3.004006. 
Maybe you need to use the -C switch, or remove the old config files? 
Skipping this file at 
/root/spamassassin-3.4/masses/../lib/Mail/SpamAssassin/Conf/Parser.pm 
line 414.
config: configuration file "/root/spam_rules/20_body_tests.cf" requires 
version 3.004004 of SpamAssassin, but this is code version 3.004006. 
Maybe you need to use the -C switch, or remove the old config files? 
Skipping this file at 
/root/spamassassin-3.4/masses/../lib/Mail/SpamAssassin/Conf/Parser.pm 
line 414.
config: configuration file "/root/spam_rules/20_compensate.cf" requires 
version 3.004004 of SpamAssassin, but this is code version 3.004006. 
Maybe you need to use the -C switch, or remove the old config files? 
Skipping this file at 
/root/spamassassin-3.4/masses/../lib/Mail/SpamAssassin/Conf/Parser.pm 
line 414.
config: configuration file "/root/spam_rules/20_dnsbl_tests.cf" requires 
version 3.004004 of SpamAssassin, but this is code version 3.004006. 
Maybe you need to use the -C switch, or remove the old config files? 
Skipping this file at 
/root/spamassassin-3.4/masses/../lib/Mail/SpamAssassin/Conf/Parser.pm 
line 414.



spamassassin -V reports: "SpamAssassin version 3.4.4"

I imagine I have to checkout an older 3.4.4 point version from SVN and 
use the mass-check command from that. It's been ages since I've used 
SVN.


How can I get to the older version via SVN?

Getting "config: registryboundaries: no tlds defined, need to run sa-update" message when running mass-check

2021-04-25 Thread Steve Dondley


I'm running this command:

./mass-check -n --rules='^LOCAL_AWK_INTRO' -o 
ham:dir:/spam/Maildir/.INBOX*  -c=/etc/spamassassin/ | grep '.  1'



Everything appears to work as expected but I'm getting this 
warning/error when I do:


"config: registryboundaries: no tlds defined, need to run sa-update"

Running sa-update doesn't fix the problem and a search didn't uncover 
anything useful.

Re: Two different machines running same versoin of SA giving different scores for scores that are commented out

2021-04-25 Thread Steve Dondley


On 2021-04-25 10:19 AM, RW wrote:

On Sun, 25 Apr 2021 00:40:59 -0400
Steve Dondley wrote:




On both machines, /usr/share/spasmassassin/72_active.cf has this rule
which is commented out:



This is the legacy rule directory from  before sa-update existed.

Have you not got another directory populated by sa-update?


Yeah, I got it working after Rendi gave me a clue. Thanks.

Re: Two different machines running same versoin of SA giving different scores for scores that are commented out

2021-04-25 Thread Steve Dondley


On 2021-04-25 05:57 AM, Reindl Harald wrote:

Am 25.04.21 um 07:09 schrieb Steve Dondley:

That rule has this line in the 72_active.cf file:


Look in 72_scores.cf and compare the modification dates on that file.

Their scores as of today (saturday):

72_scores.cf:score FSL_BULK_SIG  0.001 0.001 
0.001 0.001
72_scores.cf:score PP_MIME_FAKE_ASCII_TEXT   0.999 0.837 
0.999 0.837


The date is Jan 30, 2020. I'm running SA 3.4.4 (the version supplied 
by backports on my debian machine).
it's time to  learn about basics like sa-update and where the stuff is 
located


OK, heh. I had totally forgotten about SA updates and what they do. 
After figuring out sa-update and getting it working properly on both 
machines, the scores are the same now. Thanks.

Re: Two different machines running same versoin of SA giving different scores for scores that are commented out

2021-04-24 Thread Steve Dondley


On 2021-04-25 01:00 AM, John Hardin wrote:

On Sun, 25 Apr 2021, Steve Dondley wrote:

I'm running the same version of SA on the same email on two different 
machines and getting different scores in for some rules in the report:


Machine A gives: 0.0 FSL_BULK_SIG   Bulk signature with no 
Unsubscribe
Machine B gives: 1.0 FSL_BULK_SIG   Bulk signature with no 
Unsubscribe


On both machines, /usr/share/spasmassassin/72_active.cf has this rule 
which is commented out:


...

Machine A: 0.3 PP_MIME_FAKE_ASCII_TEXT BODY: MIME text/plain claims to 
be ASCII
Machine B: 1.0 PP_MIME_FAKE_ASCII_TEXT BODY: MIME text/plain claims to 
be ASCII


That rule has this line in the 72_active.cf file:


Look in 72_scores.cf and compare the modification dates on that file.

Their scores as of today (saturday):

72_scores.cf:score FSL_BULK_SIG  0.001 0.001 
0.001 0.001
72_scores.cf:score PP_MIME_FAKE_ASCII_TEXT   0.999 0.837 
0.999 0.837


The date is Jan 30, 2020. I'm running SA 3.4.4 (the version supplied by 
backports on my debian machine).

Two different machines running same versoin of SA giving different scores for scores that are commented out

2021-04-24 Thread Steve Dondley

I'm running the same version of SA on the same email on two different 
machines and getting different scores in for some rules in the report:


Machine A gives: 0.0 FSL_BULK_SIG   Bulk signature with no 
Unsubscribe
Machine B gives: 1.0 FSL_BULK_SIG   Bulk signature with no 
Unsubscribe


On both machines, /usr/share/spasmassassin/72_active.cf has this rule 
which is commented out:


#scoreFSL_BULK_SIG  3.000   # limit

Similarly, for another rule that's commented out, I'm getting:

Machine A: 0.3 PP_MIME_FAKE_ASCII_TEXT BODY: MIME text/plain claims to 
be ASCII
Machine B: 1.0 PP_MIME_FAKE_ASCII_TEXT BODY: MIME text/plain claims to 
be ASCII


That rule has this line in the 72_active.cf file:

#scorePP_MIME_FAKE_ASCII_TEXT  1.0


It appears Machine A is somehow caching the old scores for rules that 
have been commented out. Restarting spamassassin daemon doesn't help. 
The command I'm running to generate the report is:


spamc -R < 
/spam/Maildir/.Spam/cur/1619286920.M132164P23787.email.dondley.com\,S\=5093\,W\=5214\:2\,S

Re: Script or command for testing new rules to ensure new rules don't generate false positives/negatives?

2021-04-24 Thread Steve Dondley






And if you want to test your rules against a corpus rather than
testing against a few one-off spamples, then look into setting up a
local masscheck instance. You don't need to upload the results to SA,
but it will give you a good overview of how a rule behaves against
multiple messages.


I'm not sure what you mean by "Local masscheck instance". But I plan to 
do the following:


1) set up SA in a docker container which has a volume containing my 
spam/ham folders

2) run a script that syncs ham/spam with live server
2) set up a script that will compare scores before a rule is implemented 
and with scores after it is implemented
3) script will output a report that tells me the results and report 
whether a spam/ham email is "flipped"

Re: Script or command for testing new rules to ensure new rules don't generate false positives/negatives?

2021-04-24 Thread Steve Dondley


On 2021-04-23 05:41 PM, Martin Gregorie wrote:

On Fri, 2021-04-23 at 16:28 -0400, Steve Dondley wrote:

I'm experimenting with writing a library of my own SA rules and
scores.


I do this on a separate computer, which has Spamassassin installed but
not linked into anything else. It also has a copy of all the live SA
configuration files. Alongside this I have a directory filled with
examples of spam to function as testing input.

Along with I have a bash script or two which is used to do things like:

1) start SA in debug mode to check the testing config for errors. 
   No messages are processed - its just looking for configuration
   errors.

2) run SA against a spam sample and only display the list of spam hits

3) run SA against a spam sample and display the entire output message
   using less so it can be scrolled through

4) run SA against the complete spam collection and only display
   references to messages which are not scored as spam

5) replace the live SA configuration with with the current testing
  configuration, i.e. make the most set of changes live.

In practise (1) through (3) are east to combine into a single script
with an option to select the required action while (4) and (5) are best
kept separate.

It helps a lot of to name the items in the spam collection to relate
each set of similar spam to the local rule that's intended to trap this
spam type.


I'd like to be sure that the rules I write don't turn ham into spam
and vice versa.


It won't if you test the rules against related spam and give some
thought to the score you apply to each rule.


I imagine a utility like this must exists so figured I'd ask here
before re-inventing the wheel and writing my own (probably bugg)
script.


The sort of scripts I use are fairly short and simple.


The script would need to check against all email files in .INBOX.* and
.Spam directory in a user's IMAP directory.


No. Treat this like any other code development project: use a rule
development SA installation like I describe so you never develop rules
using the live mail stream. This way your rules will be better written
and tested and you'll cause fewer false positives in your live mail
stream.

Martin


Sounds like the best plan. Thanks for the advice.

Script or command for testing new rules to ensure new rules don't generate false positives/negatives?

2021-04-23 Thread Steve Dondley

I'm experimenting with writing a library of my own SA rules and scores. 
I'd like to be sure that the rules I write don't turn ham into spam and 
vice versa. I figured the best way to do this would be to run SA against 
an existing collection of ham and spam to make sure emails are still 
scored accurately with the new rules.


I imagine a utility like this must exists so figured I'd ask here before 
re-inventing the wheel and writing my own (probably bugg) script.


The script would need to check against all email files in .INBOX.* and 
.Spam directory in a user's IMAP directory.


Thanks again, everyone.

Re: Why single periods in regex in spamassassin rules?

2021-04-23 Thread Steve Dondley


On 2021-04-23 01:37 PM, Henrik K wrote:

On Fri, Apr 23, 2021 at 01:03:33PM -0400, Steve Dondley wrote:

I'm looking at KAM.cf. There is this rule:

body__KAM_WEB2  /INDIA based
IT|indian.based.website|certified.it.company/i

I'm wondering if there is a good reason why a singe period is used 
instead
of something like \s+ which would catch multiple spaces whereas a 
singe

period doesn't.


It would make no difference, because body is normalized from 
consecutive

spaces into single spaces.

https://cwiki.apache.org/confluence/display/SPAMASSASSIN/WritingRulesAdvanced


Makes sense. And thanks for the link. I was looking for some king of 
guidance on writing rules. Google didn't help much.

Re: how to disable spamcheck for Outgoing mail

2021-04-23 Thread Steve Dondley

On 2021-04-23 01:02 PM, mau...@gmx.ch wrote:

> Hello 
> 
> Please how its possible to disable the spam check from sending mails from 
> "privat to public" network? 
> 
> I was realy thinking if enable the trusted network this will pass over.  
> 
> trusted_networks 192.168.28. 
> 
> thanks

Are you using postfix? If so, you can do something like this: 

submission inet  n - y - -smtpd
 -o content_filter=spamassassin

Why single periods in regex in spamassassin rules?

2021-04-23 Thread Steve Dondley


I'm looking at KAM.cf. There is this rule:

body__KAM_WEB2  /INDIA based 
IT|indian.based.website|certified.it.company/i


I'm wondering if there is a good reason why a singe period is used 
instead of something like \s+ which would catch multiple spaces whereas 
a singe period doesn't.

Re: SA seems powerless against marketing emails for SEO/web development

2021-04-23 Thread Steve Dondley





I could add another point between BAYES_999 and BAYES_99 scores but
that seems reactionary. Is there a better way? Should I thrown in
another point for certain keywords in marketing emails like these?


add score to tags that score possitive 0.0

until it gives 5.0 and above


I like this idea. Seems reasonable. Thanks.

Re: SA seems powerless against marketing emails for SEO/web development

2021-04-22 Thread Steve Dondley


On 2021-04-22 02:31 PM, Matus UHLAR - fantomas wrote:

On 22.04.21 14:21, Steve Dondley wrote:

pts rule name  description
 -- 
--
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at 
https://www.dnswl.org/,

no trust
   [209.85.210.44 listed in list.dnswl.org]
-1.0 BAYES_00   BODY: Bayes spam probability is 0 to 1%
   [score: 0.]
-0.0 SPF_PASS   SPF: sender matches SPF record
0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends
   in digit
   [margaretkelly866[at]gmail.com]
0.0 FREEMAIL_FROM  Sender email is commonly abused enduser 
mail

   provider
   [margaretkelly866[at]gmail.com]
0.0 SPF_HELO_NONE  SPF: HELO does not publish an SPF Record
-0.0 RCVD_IN_MSPIKE_H3  RBL: Good reputation (+3)
   [209.85.210.44 listed in wl.mailspike.net]
0.0 HTML_MESSAGE   BODY: HTML included in message
-0.1 DKIM_VALID_EF  Message has a valid DKIM or DK signature 
from

   envelope-from domain
0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not 
necessarily

   valid
-0.1 DKIM_VALID_AU  Message has a valid DKIM or DK signature 
from

   author\'s domain
-0.1 DKIM_VALID Message has at least one valid DKIM or DK 
signature

-0.0 RCVD_IN_MSPIKE_WL  Mailspike good senders

This email is bit of an outlier as most of these emails will get 
flagged with bayes_99 and bayes_999 but this one actually gives it 
bayes_00.


My bayes filter has been trained with about 2000 examples of spam and 
ham.


now, train as needed - this one as spam.


OK, so I fixed my configuration issue. So now the bayes filtering is 
working when I flag an email as spam in my mail client:


Content analysis details:   (4.5 points, 5.0 required)

 pts rule name  description
 -- 
--


 1.0 BAYES_999  BODY: Bayes spam probability is 99.9 to 100%
[score: 1.]
 3.5 BAYES_99   BODY: Bayes spam probability is 99 to 100%
[score: 1.]


But as you can see, the email is still not hitting the 5.0 threshold.

I could add another point between BAYES_999 and BAYES_99 scores but that 
seems reactionary. Is there a better way? Should I thrown in another 
point for certain keywords in marketing emails like these?

SA seems powerless against marketing emails for SEO/web development

2021-04-22 Thread Steve Dondley

For whatever reason, solicitations from marketers for various web 
development services are easily slipping through my defenses. I figured 
bayes filtering would eventually do the job but after a reporting them 
for many days now, I'm still getting like 3 to half dozen a day. Here's 
one example: https://paste.debian.net/1194735/


The report for this email:

Content analysis details:   (-1.0 points, 5.0 required)

 pts rule name  description
 -- 
--
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at 
https://www.dnswl.org/,

 no trust
[209.85.210.44 listed in list.dnswl.org]
-1.0 BAYES_00   BODY: Bayes spam probability is 0 to 1%
[score: 0.]
-0.0 SPF_PASS   SPF: sender matches SPF record
 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends
in digit
[margaretkelly866[at]gmail.com]
 0.0 FREEMAIL_FROM  Sender email is commonly abused enduser mail
provider
[margaretkelly866[at]gmail.com]
 0.0 SPF_HELO_NONE  SPF: HELO does not publish an SPF Record
-0.0 RCVD_IN_MSPIKE_H3  RBL: Good reputation (+3)
[209.85.210.44 listed in wl.mailspike.net]
 0.0 HTML_MESSAGE   BODY: HTML included in message
-0.1 DKIM_VALID_EF  Message has a valid DKIM or DK signature 
from

envelope-from domain
 0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not 
necessarily

valid
-0.1 DKIM_VALID_AU  Message has a valid DKIM or DK signature 
from

author\'s domain
-0.1 DKIM_VALID Message has at least one valid DKIM or DK 
signature

-0.0 RCVD_IN_MSPIKE_WL  Mailspike good senders

This email is bit of an outlier as most of these emails will get flagged 
with bayes_99 and bayes_999 but this one actually gives it bayes_00.



My bayes filter has been trained with about 2000 examples of spam and 
ham.


Not sure what to do at this point. I'm thinking about scoring up emails 
if the mention stuff like "SEO", "web design" etc. but I'm not sure if 
this is the best approach. Feels like a thumb in the dike approach.

Re: DCC license

2021-04-22 Thread Steve Dondley





The DCC FAQ at https://www.dcc-servers.net/dcc/FAQ.html#license
describes the definitive ways to get any questions answered regarding
DCC licensing. Any answers you could get here would be conjecture and
anecdote.


I found a form on their website for licensing questions. Waiting to hear 
back.

DCC license

2021-04-22 Thread Steve Dondley


Sorry if this is a bit off-topic.

I'm looking into installing DCC (Distributed Checksum Clearninghouse) 
software.


The page at https://www.dcc-servers.net/dcc/INSTALL.html says:

"The free license is intended to cover individuals and organizations 
including Internet service providers using DCC to filter their own mail. 
Organizations selling anti-spam appliances or managed mail services are 
not eligible for the free license."


However, when I look at the actual LICENSE file that ships with the 
software, it says:


 * Permission to use, copy, modify, and distribute this software without
 * changes for any purpose with or without fee is hereby granted, 
provided
 * that the above copyright notice and this permission notice appear in 
all

 * copies and any distributed versions or copies are either unchanged
 * or not called anything similar to "DCC" or "Distributed Checksum
 * Clearinghouse".
 *
 * __
 *
 * THE SOFTWARE IS PROVIDED "AS IS" AND RHYOLITE SOFTWARE, LLC DISCLAIMS 
ALL
 * WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED 
WARRANTIES
 * OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL RHYOLITE SOFTWARE, 
LLC

 * BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES
 * OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR 
PROFITS,
 * WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS 
ACTION,
 * ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS 
SOFTWARE.

 *


I don't see anything in there about disallowing usage of the software by 
"managed mail services."

Re: pyzor

2021-04-21 Thread Steve Dondley


On 2021-04-21 11:00 AM, Eric Broch wrote:

Does anyone one have a solution to this:

spamd[]: pyzor: check failed: internal error, python traceback
seen in response

I have this in my local.cf

#pyzor
use_pyzor 1
pyzor_path /usr/bin/pyzor


I don't have this in my config at all. Maybe you are following outdated 
advice?


Make sure you have the pyzor plugin line uncommented:

loadplugin Mail::SpamAssassin::Plugin::Pyzor

Also, ensure you have installed the pyzor package on your OS.

Spoofed amazon order email

2021-04-16 Thread Steve Dondley

First, thanks to everyone on the list how has given me a hand over the 
past couple of weeks as I get my "sea legs" with spamassassin. It's 
working well for me now but I obviously still have more to learn.


For one, I'm still uncertain on the best way to fine tune SA to beat 
back some tricky spam. Like this one that comes from a gmail account but 
spoofs a fake, expensive order on amazon to try to phish the user.


Return-Path: 
Delivered-To: s...@dondley.com
Received: from email.dondley.com
by email.dondley.com with LMTP
id Ev9rGkyheWBeegAAB604Gw
(envelope-from )
for ; Fri, 16 Apr 2021 10:38:04 -0400
Received: by email.dondley.com (Postfix, from userid 115)
id 5EFD521516; Fri, 16 Apr 2021 10:38:04 -0400 (EDT)
Authentication-Results: email.dondley.com;
	dkim=pass (2048-bit key; unprotected) header.d=gmail.com 
header.i=@gmail.com header.b="Fi/GiyLT";

dkim-atps=neutral
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on 
email.dondley.com

X-Spam-Level:
X-Spam-Status: No, score=0.9 required=5.0 tests=BAYES_20,DKIM_SIGNED,
DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,GB_FROM_NAME_FREEMAIL,
HTML_MESSAGE,MIME_HTML_MOSTLY,NAME_EMAIL_DIFF,RCVD_IN_DNSWL_NONE,
RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS
shortcircuit=no autolearn=no autolearn_force=no version=3.4.2
X-Spam-Language: en
Received-SPF: Pass (mailfrom) identity=mailfrom; 
client-ip=209.85.216.54; helo=mail-pj1-f54.google.com; 
envelope-from=gk5751...@gmail.com; receiver=
Received: from mail-pj1-f54.google.com (mail-pj1-f54.google.com 
[209.85.216.54])

by email.dondley.com (Postfix) with ESMTPS id 9DFB9210C1
for ; Fri, 16 Apr 2021 10:37:53 -0400 (EDT)
Received: by mail-pj1-f54.google.com with SMTP id 
kb13-20020a17090ae7cdb02901503d67f0beso3185770pjb.0

for ; Fri, 16 Apr 2021 07:37:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=20161025;
h=message-id:date:from:mime-version:subject:to;
bh=tbWgclEtavQLHj3b2u0ycLuH4u7X12CkOv+d/W8zWrs=;

b=Fi/GiyLThBU+Sf1M8Thsh4lWYqGeC2mX1d6uL+5grFufl8EA68jtMePxe1TsIetKPj
 
oCRdmdkjvxAGFA0Uny2lttK9Xhpmoa38zO0rLmFLN+tzKTHYuKKoiQx6ugByfCpk6A82
 
QDyDgRp7HpEkA34ztYXqR9Q0MH8eTPPaK7iNTbdq2Sb78PYR+XNX9UVDnWarVSmlQm6N
 
EwrQKnzaaT4WKuUrmXS8tkGJMLLfWxLQAu0oCxbKwDkjW7yLMVYGl1Zhk7tNjoi2Hk2r
 
xywZ0v6AyAbSTawCrUN052ps4xjKR/o0CLHrkk+FLbu9wENYbhrDNb/HMRu20aTzEgHn

 AvZA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;

h=x-gm-message-state:message-id:date:from:mime-version:subject:to;

bh=tbWgclEtavQLHj3b2u0ycLuH4u7X12CkOv+d/W8zWrs=;

b=D4cfDeHF3n8JokVklJNHvyFD04InVRxq/DLHtB+xrMenRQZDQPHMqH5KdJBAgs4hAD
 
hc1YTl90K8wFUUAicyyzwhAzBTJqqCtmOZJczjjoXj9WXxEBqiJvgB5m2H+UvTejEX/0
 
AA/Exf6uvfuGP5hsrp7o4i22DBc/FlZDVArJt7wN+u+zjO1+rRFgrfbW6fdWzgYkb6Y2
 
jV/JTQywhNxSY6XaOSd4AA1i9ZC8LOaqkOLabUy1WI7uEWDOvzaO4MZuBzHi23vmdHlA
 
weh507+u6rXpN6BarAXZEZxnC+yev86JRqtQjJZL5qTpbjhb2s/1g6wSeRNF1Ri7qIXs

 zbfA==
X-Gm-Message-State: 
AOAM5322u+9pAxfsMRqYaM8FgbXE+0nBCEZeqd286+mfRDrabuuIhCVe

CLSzPPcNsg+v2Px14I1WF9r5vuoVLtg=
X-Google-Smtp-Source: 
ABdhPJw1ixhEhS6bCqFtjizgrTxFo6mCL1fEQPBSzQxIDGkIqIwR7np7Mgjy6ap0Lx6VHje5LfeKwQ==
X-Received: by 2002:a17:90a:5407:: with SMTP id 
z7mr10416174pjh.228.1618583872037;

Fri, 16 Apr 2021 07:37:52 -0700 (PDT)
Received: from 
1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa 
([104.143.92.92])
by smtp.gmail.com with ESMTPSA id 
t15sm5203451pgh.33.2021.04.16.07.37.49

for 
(version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
Fri, 16 Apr 2021 07:37:51 -0700 (PDT)
Message-ID: <6079a13f.1c69fb81.a9651.e...@mx.google.com>
Date: Fri, 16 Apr 2021 07:37:51 -0700 (PDT)
From: "or...@amazon.com" 
X-Google-Original-From: "or...@amazon.com" 
Content-Type: multipart/alternative; 
boundary="===2707982310301423984=="

MIME-Version: 1.0
Subject: IVK-1250703-9254770 | Apple Watch Series 6 Order Now Confirmed
To: s...@dondley.com

--===2707982310301423984==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit

Hello there, S!

This is a test template...

--===2707982310301423984==
Content-Type: text/html; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit




href="https://go.pardot.com/unsubscribe/u/272832/9445773a5f7e92b64a4b106d30d12be4ec08e6d19850125ed1a094fe7f00100f/734801457; 
target="_blank">List-Unsubscribe


cellspacing="0" cellpadding="0" align="center">













Your Order | Your 
Account | Amazon.com
ORDER NUMBER
# 
IVK-1250703-9254770













Dear 
S
Thank you for shopping 
with us. You have ordered the Apple Watch Series 6 Space Gray 44 mm GPS + Cellular
In-case you require any 
change in order or like to

Re: DNSWL overriding bayes_99 and bayes_999 rules

2021-04-12 Thread Steve Dondley

On 2021-04-12 03:11 AM, Matthias Leisi wrote:

> -2.0 RCVD_IN_DNSWL_HI   RBL: Sender listed at
> https://www.dnswl.org/,
> high trust
> [203.160.71.180 listed in list.dnswl.org [1]] I looked up this, and the other 
> one, and didn't find them in dnswl.   As
> others said, if you are using public DNS, stop doing that immediately.
> And, run the dnswl queries with dig or host yourself on your own machine.

  Answering to this mail, I could have used any of the others. 

At dnswl.org [2], we have the fair use policy of 100'000 queries per 24
hours. Those that are consistently way above this threshold (either on
an individual IP, within a block of IPs or spread over multiple blocks)
may get blocked. 

„Blocked" is not straightforward in DNS - if you simply return REFUSE
status code, resolvers may retry on other nameservers, thus effectively
multiplying the (useless) traffic. To avoid this, we have a number of
strategies: 

* „pass" - for those we don't want to block
* „parentblock" - we do not return the actual NS records for the
list.dnswl.org [1] zone from the parent zone; all A records in
*.list.dnswl.org [1] return 127.0.0.255 and a corresponding TXT record -
that's the default strategy for most part of those that query above the
fair use threshold.
* „refuse" - see above, rarely used
* „empty" - we return NXDOMAIN. Not currently used.
* „ignore" - we don't return anything. Not currently used.
* „returnhi" - for those that try to evade „parentblock" (eg by
directly querying list.dnswl.org [1] nameservers), or who do not take
action after long times of „parentblock" (and which also did not change
behaviour on „refuse"), we return „hi" in order to make them go away
eventually.
* We may chose to escalate from single IPs to eg v4-/24 or
v6-/48-or-larger for active evaders (eg frequently changing nameserver
IPs), and we may also use „returnhi". Interstingly, we have a surprising
high number of „returnhi" cases which have been querying us for _years_
without a change in behaviour. From time to time we change them to one
of the other strategies. It would be interesting to dig in what they are
actually thinking... 

It's likely that the OP is using a nameserver where we have „returnhi". 

Obviously the advice given in this threat (use a local caching resolver
who does not forward queries) is correct and will that problem magically
go away :) 

-- Matthias 

Ah, thank you for the explanation. 

Following the advice on this list, I set up a locally running running
DNS server. Since that time, I have not seen the problem of _HI scores
in my spam emails. 

Links:
--
[1] http://list.dnswl.org
[2] http://dnswl.org

Re: Using spamassassin to thwart sharepoint phishing attacks

2021-04-12 Thread Steve Dondley





However, in 50_scores.cf, this line is commented out:

#score RCVD_IN_SORBS_SPAM 0 0.5 0 0.5

Maybe that's the problem?


no, there are other SORBS lists used:

score RCVD_IN_SORBS_DUL 0 0.001 0 0.001 # n=0 n=2
score RCVD_IN_SORBS_HTTP 0 2.499 0 0.001 # n=0 n=2
score RCVD_IN_SORBS_MISC 0 # n=0 n=1 n=2 n=3
score RCVD_IN_SORBS_SMTP 0 # n=0 n=1 n=2 n=3
score RCVD_IN_SORBS_SOCKS 0 2.443 0 1.927 # n=0 n=2
#score RCVD_IN_SORBS_SPAM 0 0.5 0 0.5
score RCVD_IN_SORBS_WEB  0 1.5 0 1.5
score RCVD_IN_SORBS_ZOMBIE 0 # n=0 n=1 n=2 n=3


have you set up own caching, non-forwarding DNS server?


Yes. And my SA scores have improved about 100% since I did this.

Re: Using spamassassin to thwart sharepoint phishing attacks

2021-04-11 Thread Steve Dondley





sorbs dnsbl missing, have you denied sorbs.net results ?, or is
spamassassin not testing sorbs.net anymore ?


Best I can tell, my SA config should be testing for sorbs. I've got this 
line in /etc/spamassassin/v3220.pre:


loadplugin Mail::SpamAssassin::Plugin::DNSEval

And in /usr/share/spamassassin/20_dnsbl_test.cf, I've got:

ifplugin Mail::SpamAssassin::Plugin::DNSEval

I see a bunch of SORBS rules in there.

However, in 50_scores.cf, this line is commented out:

#score RCVD_IN_SORBS_SPAM 0 0.5 0 0.5

Maybe that's the problem?

Re: Using spamassassin to thwart sharepoint phishing attacks

2021-04-11 Thread Steve Dondley





Also, I've heard of sorbs over the years but I'm not sure exactly what
it is. Is this the same block list run by Cisco?


OK, I was getting SORBS confused with SenderBase Reputation Score 
(SBRS). That's the one run by Cisco, I believe.


I actually have an account on the SORBS website that I set up long ago.

Re: Using spamassassin to thwart sharepoint phishing attacks

2021-04-11 Thread Steve Dondley






sorbs dnsbl missing, have you denied sorbs.net results ?, or is
spamassassin not testing sorbs.net anymore ?


How would I check if it's turned on? I tried grepping in 
/etc/spamassassin on "sorb" (case insensitive) and found nothing. So I 
guess it's not in my default config.


I see many mentions of "SORBS" in /usr/share/spamassassin, however. I'm 
guessing I may not have a needed SA plugin enabled. I'll try to figure 
out how to do it.


Also, I've heard of sorbs over the years but I'm not sure exactly what 
it is. Is this the same block list run by Cisco?

Re: Is pyzor recommended by folks on this list?

2021-04-11 Thread Steve Dondley





Second, I'm not sure if my tests will work on my spam samples which
have the spam encapsulated with the "report_safe" setting set to a
value of "1".


I wouldn't expect it to work at all. "report_safe" encapsulation
creates a new email which isn't a spam.


From what I read on pyzor's home page and how it works, pyzor strips off 
all headers. So I would assume it doesn't matter if it's encapsulated. I 
could be, and quite likely am, totally wrong about this, of course.

Re: Using spamassassin to thwart sharepoint phishing attacks

2021-04-11 Thread Steve Dondley


On 2021-04-11 04:19 PM, Benny Pedersen wrote:

On 2021-04-11 22:09, Steve Dondley wrote:


Content analysis details:   (4.4 points, 5.0 required)

 pts rule name  description
 -- 
--

 3.5 BAYES_99   BODY: Bayes spam probability is 99 to 100%
[score: 1.]
 0.5 BAYES_999  BODY: Bayes spam probability is 99.9 to 
100%

[score: 1.]
-0.0 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
[52.100.189.222 listed in 
wl.mailspike.net]
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at 
https://www.dnswl.org/,

 no trust
[52.100.189.222 listed in list.dnswl.org]
-0.0 SPF_HELO_PASS  SPF: HELO matches SPF record
-0.0 SPF_PASS   SPF: sender matches SPF record
 0.5 SUBJ_ALL_CAPS  Subject is all capitals
 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.1 MIME_HTML_ONLY BODY: Message only has text/html MIME 
parts
-0.1 DKIM_VALID Message has at least one valid DKIM or DK 
signature
 0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not 
necessarily

valid
-0.1 DKIM_VALID_AU  Message has a valid DKIM or DK signature 
from

author's domain
-0.1 DKIM_VALID_EF  Message has a valid DKIM or DK signature 
from

envelope-from domain
 0.0 UPPERCASE_50_75message body is 50-75% uppercase


i see its as a local problem

http://multirbl.valli.org/lookup/52.100.189.222.html

do you use KAM.cf channel ?


OK, I added KAM.cf to my config. It has now pushed it over 5.0, barely:

Content analysis details:   (5.1 points, 5.0 required)

 pts rule name  description
 -- 
--
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at 
https://www.dnswl.org/,

 no trust
[52.100.189.222 listed in list.dnswl.org]
 3.5 BAYES_99   BODY: Bayes spam probability is 99 to 100%
[score: 1.]
 0.5 BAYES_999  BODY: Bayes spam probability is 99.9 to 100%
[score: 1.]
-0.0 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
[52.100.189.222 listed in wl.mailspike.net]
-0.0 SPF_HELO_PASS  SPF: HELO matches SPF record
 0.5 SUBJ_ALL_CAPS  Subject is all capitals
-0.0 SPF_PASS   SPF: sender matches SPF record
 0.1 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
 0.0 HTML_MESSAGE   BODY: HTML included in message
-0.1 DKIM_VALID Message has at least one valid DKIM or DK 
signature
-0.1 DKIM_VALID_EF  Message has a valid DKIM or DK signature 
from

envelope-from domain
-0.1 DKIM_VALID_AU  Message has a valid DKIM or DK signature 
from

author's domain
 0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not 
necessarily

valid
 0.0 UPPERCASE_50_75message body is 50-75% uppercase
 0.2 KAM_MANYTO Email has more than one To Header or more 
than 25

recipients
 0.5 KAM_NUMSUBJECT Subject ends in numbers excluding current 
years

 0.0 KAM_SHORT  Use of a URL Shortener for very short URL

Using spamassassin to thwart sharepoint phishing attacks

2021-04-11 Thread Steve Dondley

I've received about a dozen phishing attack emails from Microsoft's 
sharepoint service within the last couple of weeks. Only one of them was 
identified by SA as spam. After running the emails through sa-learn, 
they still only score a 4 to 4.5. But I could see that it would be easy 
for these emails to get classified as false positives and/or false 
negatives.


Has anyone developed a good way to identify these sharepoint phishing 
attacks without any false positives?


I'm leaning towards figuring out how I might inject some kind of 
prominent warning into the message to remind people not to click links 
they don't trust. That's not an ideal solution, but perhaps it is the 
best way to help protect users. I'm interested to hear what other 
options might be available.


Here is how SA scored one of the emails:

4.4/5.0
Spam detection software, running on the system "email.dondley.com",
has NOT identified this incoming email as spam.  The original
message has been attached to this so you can view it or label
similar future email.  If you have any questions, see
the administrator of that system for details.

Content preview:  Doris Feaster shared a file with you STRIP BANG THE 
ONLINE
   REAL & MOST POPULAR 100% TRUSTED NETWORK STRIPBANG GIVING FREE ELITE 
MEMBERSHIP

   AND 5000CR=$750 WINNER 2021 YOUR WINNING CODE - ( STBNG5000CR )

Content analysis details:   (4.4 points, 5.0 required)

 pts rule name  description
 -- 
--

 3.5 BAYES_99   BODY: Bayes spam probability is 99 to 100%
[score: 1.]
 0.5 BAYES_999  BODY: Bayes spam probability is 99.9 to 100%
[score: 1.]
-0.0 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
[52.100.189.222 listed in wl.mailspike.net]
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at 
https://www.dnswl.org/,

 no trust
[52.100.189.222 listed in list.dnswl.org]
-0.0 SPF_HELO_PASS  SPF: HELO matches SPF record
-0.0 SPF_PASS   SPF: sender matches SPF record
 0.5 SUBJ_ALL_CAPS  Subject is all capitals
 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.1 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
-0.1 DKIM_VALID Message has at least one valid DKIM or DK 
signature
 0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not 
necessarily

valid
-0.1 DKIM_VALID_AU  Message has a valid DKIM or DK signature 
from

author's domain
-0.1 DKIM_VALID_EF  Message has a valid DKIM or DK signature 
from

envelope-from domain
 0.0 UPPERCASE_50_75message body is 50-75% uppercase

Re: Is pyzor recommended by folks on this list?

2021-04-11 Thread Steve Dondley


On 2021-04-11 03:09 PM, Bill Cole wrote:

On 11 Apr 2021, at 13:21, Steve Dondley wrote:


value of "1". By the way, anyone know of a CLI utility for extracting
the original spam email from these files?


spamassassin -d < wrappedspam.eml


Ah, ok. I was familiar with the -d option but did not know it could be 
used to redirect to output like this:


spamassassin -d < filtred_email > orig_email

I tried it and it did what I needed. Thanks.

Re: Is pyzor recommended by folks on this list?

2021-04-11 Thread Steve Dondley





value of "1". By the way, anyone know of a CLI utility for extracting
the original spam email from these files?


Here's a very crude perl script that does the trick:

#!/usr/bin/perl

use strict;
use warnings;

my $email;
while (<>) {
  $email .= $_;
}

my ($boundary) = $email =~ /boundary="(.*)"/;
my ($orig_content) = $email =~ 
/^--$boundary.*^--$boundary(.*)$boundary--/ms;


print $orig_content;

You would use it like this:

./spam_extractor.pl < email_file_with_encapsualted_spam

Re: Is pyzor recommended by folks on this list?

2021-04-11 Thread Steve Dondley


On 2021-04-11 09:34 AM, Benny Pedersen wrote:

On 2021-04-11 15:13, Steve Dondley wrote:


What do you think?


pyzor is usefull if running pyzord localy, design of pyzor was imho
ment to be local pyzord and have the pyzor client query local, but
pyzord could be get results from other pyzord server farms,


Interesting. I wonder if it might be worth it to set up my own pyzor 
server for my own network of mail servers. That's probably going to be 
easier than sharing spam/ham samples around between users.

Is pyzor recommended by folks on this list?

2021-04-11 Thread Steve Dondley

I just installed pyzor and did a random spot check of about 10 spam 
emails to try to evaluate it using this command:


pyzor check < some_spam

Only one message gave me a hit on pyzor.

But I take my results with a grain of salt because I may not have pyzor 
configured optimally.


For one, I'm using the public pyzor server. Maybe there are other more 
useful servers?


Second, I'm not sure if my tests will work on my spam samples which have 
the spam encapsulated with the "report_safe" setting set to a value of 
"1". By the way, anyone know of a CLI utility for extracting the 
original spam email from these files?


So before I explore pyzor any further, I'm wondering if the default 
rules built into SA are good enough or if pyzor improves the accuracy of 
SA enough to be worth the extra cycles to install it and keep it 
functional.


What do you think?

Re: Spamassassin reporting IP address is whitelisted by DNSWL.org but DNSWL.org reports it is not

2021-04-10 Thread Steve Dondley


On 2021-04-10 03:20 PM, Bill Cole wrote:

On 10 Apr 2021, at 14:53, Steve Dondley wrote:

I'm very, very sorry to beat a dead horse, but I'm deeply confused by 
the "RCVD_IN_DNSWL_HI" rule which appears to be reporting incorrectly 
on my system.


STOP USING ANY PUBLIC DNS RESOLVERS WITH ANY MAIL SERVERS!


For the record, my nameserver setting in /etc/resolv.conf was some local 
IP address which presumably used an Amazon Web Service (AWS) DNS server.


After changing the IP address to 127.0.0.1 in that file, it changed 
itself back to the original IP address after some short period of time. 
To fix this, follow the appropriate instructions here: 
https://aws.amazon.com/premiumsupport/knowledge-center/ec2-static-dns-ubuntu-debian/

Spamassassin reporting IP address is whitelisted by DNSWL.org but DNSWL.org reports it is not

2021-04-10 Thread Steve Dondley

I'm very, very sorry to beat a dead horse, but I'm deeply confused by 
the "RCVD_IN_DNSWL_HI" rule which appears to be reporting incorrectly on 
my system.


I ran this command:

sudo -u s -- spamassassin -t -d < some_email

It gives me this report:

 pts rule name  description
 -- 
--

 1.2 URIBL_ABUSE_SURBL  Contains an URL listed in the ABUSE SURBL
blocklist
[URIs: bizgrouplinknews.com]
 1.7 URIBL_BLACKContains an URL listed in the URIBL 
blacklist

[URIs: bizgrouplinknews.com]
 2.5 URIBL_DBL_SPAM Contains a spam URL listed in the Spamhaus 
DBL

blocklist
[URIs: bizgrouplinknews.com]
 0.0 RCVD_IN_MSPIKE_L5  RBL: Very bad reputation (-5)
[50.30.46.135 listed in bl.mailspike.net]
-2.0 RCVD_IN_DNSWL_HI   RBL: Sender listed at 
https://www.dnswl.org/,

high trust
[50.30.46.135 listed in list.dnswl.org]
 0.5 BAYES_999  BODY: Bayes spam probability is 99.9 to 100%
[score: 1.]
 3.5 BAYES_99   BODY: Bayes spam probability is 99 to 100%
[score: 1.]
 1.3 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in
bl.spamcop.net
   [Blocked - see 
]

-0.0 SPF_PASS   SPF: sender matches SPF record
 0.0 SPF_HELO_NONE  SPF: HELO does not publish an SPF Record
 2.6 DEAR_FRIENDBODY: Dear Friend? That's not very dear!
 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.1 HTTPS_HTTP_MISMATCHBODY: No description available.
-0.1 DKIM_VALID_AU  Message has a valid DKIM or DK signature 
from

author's domain
 0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not 
necessarily

valid
-0.1 DKIM_VALID Message has at least one valid DKIM or DK 
signature

 0.0 RCVD_IN_MSPIKE_BL  Mailspike blacklisted
 3.5 URI_PHP_REDIR  PHP redirect to different URL (link 
obfuscation)



So it's showing the IP address 50.30.46.135 is whitelisted as shown by 
the RCVD_IN_DNSWL_HI rule.


However, the dnswl.org domain shows that the 50.30.46.135 is *not* 
whitelisted: https://www.dnswl.org/s/?s=50.30.46.135


So what would account for my system reporting it as whitelisted when the 
dnswl.org domain does not report it as whitelisted?

Re: DNSWL overriding bayes_99 and bayes_999 rules

2021-04-10 Thread Steve Dondley





You should fix URIBL_BLOCKED first.
You need a local, caching, non-forwarding DNS server for SpamAssassin.


Yeah, setting up a DNS server for SA is on my todo list. Thanks.

When you say local, it doesn't have to be on the same machine as 
spamassassin, does it? I assume I can have the DNS server on a local 
network and shared between many machines.

Re: DNSWL overriding bayes_99 and bayes_999 rules

2021-04-10 Thread Steve Dondley





It would be helpful to post an entire actual set of headers --
unmodified -- along with the spamassassin -t report.  I can't figure
out (from what you posted) the IP address of the server that was in
DNSWL_HI that delivered mail to your internal/trusted network.


OK, here is the entire output of this command:

sudo -u s -- spamassassin -t -d < the_spam_email

Note: I've changed the score of RCVD_IN_DNSWL_HI hits to -2.0 from -5.0 
until I get my misconfiguration figured out. Thanks for your patience.





Received: from localhost by email.dondley.com
with SpamAssassin (version 3.4.2);
Sat, 10 Apr 2021 12:41:17 -0400
From: 
=?shift_jis?B?kmqCzI/bkqWKZ5HljHaJ5iBBaXAxMA==?=

To: 
Subject: *SPAM* 
=?shift_jis?B?g0mDk4NpgqqLgYLfgumXQojqlrOT8YLMgZqDZoNKg2CDk4GagvCBSTA5?=

Date: Sat, 10 Apr 2021 18:50:01 +0900
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on 
email.dondley.com

X-Spam-Flag: YES
X-Spam-Level: ***
X-Spam-Status: Yes, score=23.2 required=5.0 tests=BASE64_LENGTH_79_INF,
BAYES_99,BAYES_999,DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,FREEMAIL_REPLYTO,
FREEMAIL_REPLYTO_END_DIGIT,FROM_MISSP_FREEMAIL,FROM_MISSP_REPLYTO,
LOCAL_SPAM_TLD,LOCAL_UNCOMMON_TLD,MISSING_MID,NML_ADSP_CUSTOM_MED,
RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H2,RCVD_IN_PSBL,
RCVD_IN_RP_RNBL,RCVD_IN_VALIDITY_RPBL,RDNS_NONE,SPF_HELO_SOFTFAIL,
SPF_SOFTFAIL,SPOOFED_FREEMAIL,SPOOFED_FREEMAIL_NO_RDNS,
SPOOFED_FREEM_REPTO,TVD_SPACE_ENCODED shortcircuit=no autolearn=no
autolearn_force=no version=3.4.2
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="--=_6071D52D.C7B255FE"

This is a multi-part message in MIME format.

=_6071D52D.C7B255FE
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

Spam detection software, running on the system "email.dondley.com",
has identified this incoming email as possible spam.  The original
message has been attached to this so you can view it or label
similar future email.  If you have any questions, see
the administrator of that system for details.

Content preview:  
@ª{ª{ª{ª{ª{ª{ª{ª{ª{ª{ª{ª{ª{ª{ª
   @@@@@@@ÆEÅÌ{¬·øÊ 
@@@@@@@@@@yjXåTv



Content analysis details:   (23.2 points, 5.0 required)

 pts rule name  description
 -- 
--
-2.0 RCVD_IN_DNSWL_HI   RBL: Sender listed at 
https://www.dnswl.org/,

high trust
[203.160.71.180 listed in list.dnswl.org]
-0.0 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
[203.160.71.180 listed in wl.mailspike.net]
 2.7 RCVD_IN_PSBL   RBL: Received via a relay in PSBL
[203.160.71.180 listed in psbl.surriel.com]
 3.5 BAYES_99   BODY: Bayes spam probability is 99 to 100%
[score: 1.]
 0.5 BAYES_999  BODY: Bayes spam probability is 99.9 to 100%
[score: 1.]
 2.0 LOCAL_SPAM_TLD Domain originates a lot of spam
 1.0 LOCAL_UNCOMMON_TLD From address is not a common TLD
 1.3 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in
bl.spamcop.net
 [Blocked - see 
]

 1.3 RCVD_IN_VALIDITY_RPBL  RBL: Relay in Validity RPBL,
https://senderscore.org/blocklistlookup/
   [203.160.71.180 listed in 
bl.score.senderscore.com]

 0.0 FREEMAIL_FROM  Sender email is commonly abused enduser mail
provider (qy5cbma-yua06[at]yahoo.co.jp)
 0.2 FREEMAIL_REPLYTO_END_DIGIT Reply-To freemail username ends in
digit (qy5cbma-yua06[at]yahoo.co.jp)
 0.7 SPF_SOFTFAIL   SPF: sender does not match SPF record 
(softfail)

 0.0 DKIM_ADSP_CUSTOM_MED   No valid author signature, adsp_override is
 CUSTOM_MED
 0.7 SPF_HELO_SOFTFAIL  SPF: HELO does not match SPF record 
(softfail)

 1.5 BASE64_LENGTH_79_INF   BODY: base64 encoded email part uses line
length greater than 79 characters
 0.5 MISSING_MIDMissing Message-Id: header
 0.0 RCVD_IN_RP_RNBLRCVD_IN_RP_RNBL renamed to
RCVD_IN_VALIDITY_RPBL, please update local
 rules
 0.8 RDNS_NONE  Delivered to internal network by a host with 
no rDNS

 1.0 FREEMAIL_REPLYTO   Reply-To/From or Reply-To/body contain
different freemails
 0.9 NML_ADSP_CUSTOM_MEDADSP custom_med hit, and not from a mailing
list
 0.0 FROM_MISSP_REPLYTO From misspaced, has Reply-To
 2.5 TVD_SPACE_ENCODED

Re: DNSWL overriding bayes_99 and bayes_999 rules

2021-04-10 Thread Steve Dondley


On 2021-04-10 12:10 PM, Greg Troxel wrote:

Steve Dondley  writes:


Here are the headers from some egregious spam. It scored a whopping
20.8 point despite being flagged with "RCVD_IN_DNSWL_HI."

Return-Path: 
Delivered-To: s...@example.com
Received: from email.example.com
by email.example.com with LMTP
id AnV2NSCZbmCTcQAAB604Gw
(envelope-from )
for ; Thu, 08 Apr 2021 01:48:16 -0400


really?  Those are the headers?


Yes. Why do you ask? Is it unusual that this egregious example of spam 
is on DNSWL_HI?




So my advice again is:

  Run spamassassin -t on the message so you see the metadata about the
  rules like which IP hit and the per-rule score.


I've already done that on selective email messages.


  If you got spam from a sender in DNSWL_HI, report it to dnswl.org.
  Give them a week and see if they take the IP out, or what happens, 
and

  tell us how it went.


I plan on it but first:

1) I want to verify with this list I don't have something misconfigured 
before I report 300+ emails. From what I've read in the emails last 
week, this would be highly unusual.


2) If I do have that many false positives, I need to figure out how to 
bulk report that many of them.

Re: DNSWL overriding bayes_99 and bayes_999 rules

2021-04-10 Thread Steve Dondley





I have been looking at this issue a little more. I just grepped my
spam folder. Out of 1000 emails I have flagged as spam, 321 have been
flagged with RCVD_DNSWL_HI, a rule which adds -5 points to the eamil.
That's almost 1 out of 3 emails which seems pretty insane.


Here are the headers from some egregious spam. It scored a whopping 20.8 
point despite being flagged with "RCVD_IN_DNSWL_HI."


Return-Path: 
Delivered-To: s...@example.com
Received: from email.example.com
by email.example.com with LMTP
id AnV2NSCZbmCTcQAAB604Gw
(envelope-from )
for ; Thu, 08 Apr 2021 01:48:16 -0400
Received: by email.example.com (Postfix, from userid 115)
id CDD3D210E1; Thu,  8 Apr 2021 01:48:16 -0400 (EDT)
Received: from localhost by email.example.com
with SpamAssassin (version 3.4.2);
Thu, 08 Apr 2021 01:48:16 -0400
From: 
=?shift_jis?B?i9aSZoLMl6BEVkSP7pXxIEFpcDA4jYY=?=

To: 
Subject: *SPAM* 
=?shift_jis?B?lrOPQ5Czi8mU6Zesj2+DVoOKgVuDWYFFg4KDVYNDg06UaonzgUWXTJa8QVaPl5dEgXmXoIOCg22JroF6MDc=?=

Date: Thu, 08 Apr 2021 14:48:09 +0900
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on 
email.example.com

X-Spam-Flag: YES
X-Spam-Level: 
X-Spam-Status: Yes, score=20.8 required=5.0 tests=BASE64_LENGTH_79_INF,
DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,FREEMAIL_REPLYTO,
FREEMAIL_REPLYTO_END_DIGIT,FROM_MISSP_FREEMAIL,FROM_MISSP_REPLYTO,
MISSING_MID,NML_ADSP_CUSTOM_MED,RCVD_IN_BL_SPAMCOP_NET,
RCVD_IN_DNSWL_HI,RCVD_IN_PSBL,RCVD_IN_RP_RNBL,RCVD_IN_SBL_CSS,
RCVD_IN_VALIDITY_RPBL,RCVD_IN_XBL,RDNS_NONE,SPF_HELO_SOFTFAIL,
SPF_SOFTFAIL,SPOOFED_FREEMAIL,SPOOFED_FREEMAIL_NO_RDNS,
SPOOFED_FREEM_REPTO,TVD_SPACE_ENCODED,URIBL_ABUSE_SURBL,URIBL_BLOCKED
shortcircuit=no autolearn=unavailable autolearn_force=no version=3.4.2
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="--=_606E9920.15B94EAE"
Message-Id: <20210408054816.cdd3d21...@email.example.com>

Re: DNSWL overriding bayes_99 and bayes_999 rules

2021-04-10 Thread Steve Dondley


On 2021-04-06 11:48 AM, Steve Dondley wrote:

I have emails that have been flagged as spam in the past but that are
still getting through, presumably because the servers are on some
DNSWL.

Example:

X-Spam-Status: No, score=0.9 required=5.0 tests=BAYES_99,BAYES_999,
DATE_IN_PAST_03_06,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,

HTML_IMAGE_RATIO_02,HTML_MESSAGE,RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H2,

SPF_HELO_NONE,SPF_SOFTFAIL shortcircuit=no autolearn=no
autolearn_force=no version=3.4.2

What's the recommended way to handle these? Do I turn on shortcircuit?
Do I bump up the score for BAYES_99, BAYES_999? Or might there be a
way to ignore DNSWL scores if they have a high bayes score?


I have been looking at this issue a little more. I just grepped my spam 
folder. Out of 1000 emails I have flagged as spam, 321 have been flagged 
with RCVD_DNSWL_HI, a rule which adds -5 points to the eamil. That's 
almost 1 out of 3 emails which seems pretty insane.


Is anyone else seeing spam getting flagged with RCVD_DNSWL_HI resulting 
in so many false positives?

Re: Getting different SA scores when using -R argument with spamc

2021-04-06 Thread Steve Dondley


It can only do so if report_safe is set to 0. With non-zero
report_safe settings, the original mail is encapsulated as an
attachment inside a wrapper message also including the report. That
wrapper message containing the SA report is "safe" because it is fully
local, the text/plain part won't look like spam to any spam filter,
and the original, encapsulated as a message/rfc822 attachment, should
be skipped by any filter. If you want to test the *original* message,
you have to extract the message/rfc822 part into its own file and test
that.


OK, did some more googling on this. Let me spell this out and help clear 
up those who may be as confused as I was:


1) sa-learn *will* "unwrap" the original encapsulated spam emails when 
they are encapsulated by SA: 
https://cwiki.apache.org/confluence/display/SPAMASSASSIN/LearningMarkedUpMessages
2) However, the spamassassin command (or spamc/spamd) does not do this 
for you. You must use the -d option to remove any spam markup.


What this means is if that report_safe is set to "1"  (the default) in 
your SA config file, you must pull the original spam email out with the 
-d option if you wish to run it through spamassassin/spamc again. You do 
*not* have to worry about doing this with the sa-learn command.


If I got this wrong, let me know. Thanks.

Re: Getting different SA scores when using -R argument with spamc

2021-04-06 Thread Steve Dondley


On 2021-04-06 04:19 PM, Steve Dondley wrote:

It seems to have done so. Thank you.

Some MUAs have a "Reply to List" function that uses the List-Post
header (and sometimes heuristics when that header is missing) to send
replies only to a list itself.


I've recently switched to Roundcube from gmail. I didn't see that
option but I think I've figured out I just need to hit "reply". Thanks
for pointing out you were getting dupes.



It can only do so if report_safe is set to 0. With non-zero
report_safe settings, the original mail is encapsulated as an
attachment inside a wrapper message also including the report. That
wrapper message containing the SA report is "safe" because it is fully
local, the text/plain part won't look like spam to any spam filter,
and the original, encapsulated as a message/rfc822 attachment, should
be skipped by any filter. If you want to test the *original* message,
you have to extract the message/rfc822 part into its own file and test
that.


OK, so that's the problem, I guess. That config option is commented
out in my local.cf file:

# report_safe 1


I should read the documentation before asking questions. So '1' is the 
default which encapsulates the original spam as an attachment.

Re: Getting different SA scores when using -R argument with spamc

2021-04-06 Thread Steve Dondley





Some MUAs have a "Reply to List" function that uses the List-Post
header (and sometimes heuristics when that header is missing) to send
replies only to a list itself.


Ah! I see that option now under the little down arrow next to "Reply 
all". My day is made. Thanks!

Re: Getting different SA scores when using -R argument with spamc

2021-04-06 Thread Steve Dondley





It seems to have done so. Thank you.

Some MUAs have a "Reply to List" function that uses the List-Post
header (and sometimes heuristics when that header is missing) to send
replies only to a list itself.


I've recently switched to Roundcube from gmail. I didn't see that option 
but I think I've figured out I just need to hit "reply". Thanks for 
pointing out you were getting dupes.




It can only do so if report_safe is set to 0. With non-zero
report_safe settings, the original mail is encapsulated as an
attachment inside a wrapper message also including the report. That
wrapper message containing the SA report is "safe" because it is fully
local, the text/plain part won't look like spam to any spam filter,
and the original, encapsulated as a message/rfc822 attachment, should
be skipped by any filter. If you want to test the *original* message,
you have to extract the message/rfc822 part into its own file and test
that.


OK, so that's the problem, I guess. That config option is commented out 
in my local.cf file:


# report_safe 1

So what do you recommend setting this to '1'? Any downsides to that? I'm 
just a little leery of changing a default setting. But I'll do whatever 
the pros suggest.


It says a value of '2' sets it "use text/plain instead" but I don't know 
what that is referring to.

Re: Getting different SA scores when using -R argument with spamc

2021-04-06 Thread Steve Dondley


On 2021-04-06 02:55 PM, Steve Dondley wrote:

On 2021-04-06 02:32 PM, Bill Cole wrote:

PLEASE NOTE:

I read the mailing list obsessively and DO NOT NEED (or want) the
extra copies sent when you send both to me and to the list.


Sorry, I still haven't figured out how to properly respond. When I hi
"reply all" it cc's the list and sends to you. When I hit just "reply"
it only sends to you. I've manually deleted you from the "To" box and
sending it directly to the list here. Hopefully that fixes things up.


Since the scores being added during delivery are much richer,
detecting enough info to do SPF and DKIM analysis, I am 99.9% certain
that the format of 'some_email' is mangled, probably missing critical
headers or using CR linebreaks instead of proper LFs.




I just noticed the date in the email header was from about a week ago.

Re: Getting different SA scores when using -R argument with spamc

2021-04-06 Thread Steve Dondley


On 2021-04-06 02:32 PM, Bill Cole wrote:

PLEASE NOTE:

I read the mailing list obsessively and DO NOT NEED (or want) the
extra copies sent when you send both to me and to the list.


Sorry, I still haven't figured out how to properly respond. When I hi 
"reply all" it cc's the list and sends to you. When I hit just "reply" 
it only sends to you. I've manually deleted you from the "To" box and 
sending it directly to the list here. Hopefully that fixes things up.



Since the scores being added during delivery are much richer,
detecting enough info to do SPF and DKIM analysis, I am 99.9% certain
that the format of 'some_email' is mangled, probably missing critical
headers or using CR linebreaks instead of proper LFs.


Hmm, this is on a linux box, so I'm not sure how it could be screwing up 
the line breaks. Is it possible that when spamd injects the scores 
before the body of the email, it is screwing things up?


Here is email as it sits in my inbox now, which is after it gets 
processed by spamd. I was under the impression that an email that had 
already been processed by SA could be processed again and it would 
ignore any modifications made by earlier passes through SA.


Return-Path: 


Delivered-To: s...@exmaple.com
Received: from email.exmaple.com
by email.exmaple.com with LMTP
id kAhSKc1dY2BCKgAAB604Gw
(envelope-from 
)

for ; Tue, 30 Mar 2021 13:20:13 -0400
Received: by email.exmaple.com (Postfix, from userid 115)
id A64BE200C8; Tue, 30 Mar 2021 13:20:13 -0400 (EDT)
Received: from localhost by email.exmaple.com
with SpamAssassin (version 3.4.2);
Tue, 30 Mar 2021 13:20:13 -0400
From: "Home Warranty - AHS" 
To: 
Subject: *SPAM* It's getting warmer, are you covered?
Date: Tue, 30 Mar 2021 05:18:34 -0700
Message-Id: 
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on 
email.exmaple.com

X-Spam-Flag: YES
X-Spam-Level: *
X-Spam-Status: Yes, score=5.2 required=5.0 tests=BAYES_99,BAYES_999,
DATE_IN_PAST_03_06,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,

HTML_IMAGE_RATIO_02,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2,

SPF_HELO_NONE,SPF_SOFTFAIL shortcircuit=no autolearn=no
autolearn_force=no version=3.4.2
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="--=_60635DCD.A0F5D194"

This is a multi-part message in MIME format.

=_60635DCD.A0F5D194
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit

Spam detection software, running on the system "email.exmaple.com",
has identified this incoming email as possible spam.  The original
message has been attached to this so you can view it or label
similar future email.  If you have any questions, see
the administrator of that system for details.

Content preview:  Your AHS Home Warranty covers the repair or 
replacement of
   many system and appliance breakdowns, but not necessarily the entire 
system
   or appliance. Please refer to your contract for details. American 
Home Shield
   150 Peabody Pl., Memphis, TN 38103. Unsubscribe | Privacy Policy © 
2021

  American Home Shield Corporation. All rights reserved.

Content analysis details:   (5.2 points, 5.0 required)

 pts rule name  description
 -- 
--

 0.2 BAYES_999  BODY: Bayes spam probability is 99.9 to 100%
[score: 1.]
 3.5 BAYES_99   BODY: Bayes spam probability is 99 to 100%
[score: 1.]
 0.7 SPF_SOFTFAIL   SPF: sender does not match SPF record 
(softfail)
-0.7 RCVD_IN_DNSWL_LOW  RBL: Sender listed at 
https://www.dnswl.org/,

low trust
[69.252.207.38 listed in list.dnswl.org]
-0.0 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
[69.252.207.38 listed in wl.mailspike.net]
 1.6 DATE_IN_PAST_03_06 Date: is 3 to 6 hours before Received: date
 0.0 SPF_HELO_NONE  SPF: HELO does not publish an SPF Record
 0.0 HTML_IMAGE_RATIO_02BODY: HTML has a low ratio of text to image
area
 0.0 HTML_MESSAGE   BODY: HTML included in message
-0.1 DKIM_VALID_AU  Message has a valid DKIM or DK signature 
from

author's domain
-0.1 DKIM_VALID Message has at least one valid DKIM or DK 
signature
 0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not 
necessarily

valid

The original message was not completely plain text, and may be unsafe to
open with some email clients; in particular, it may contain a virus,
or confirm that your address can receive spam.  If you wish to view
it, it may be safer to save it to a file and open it with an editor.


=_60635DCD.A0F5D194
Content-Type: message/rfc822; x-spam-type=original

Re: Getting different SA scores when using -R argument with spamc

2021-04-06 Thread Steve Dondley





Can you provide a working example message AND the operative user prefs?


OK, I was being very stupid. It finally dawned on me that the SA scores 
that appeared above the message body and below the headers when spamc 
was run without the -R option were SA scores embedded in the message by 
the postfix software and were not getting generated by spamc.


But that doesn't change the fact that the spamassassin score that is 
generated by the postfix command is different than what I'm getting 
directly on the command line. Here's is what is in my postfix master.cf 
file:


spamassassin unix - n   n   -   -   pipe
 user=debian-spamd argv=/usr/bin/spamc -u ${user} -e 
/usr/sbin/sendmail -oi -f ${sender} ${recipient}





spamassassin --prefs-file user_prefs_file -D all < some_email

Does the score and hits match one of your spamc tests?


No. The headers have a different score and the tests are different. It's 
scored only as 2.6 with BAYES_50 while what was embedded in the email by 
postfix had a BAYES_99  and BAYES_999 ans scored 5.2. postfix score also 
shows RCVD_IN_DNSWL_LOW while running from the command line does not 
show any such test hit.


And I cannot reproduce the SA scores embedded in the email by postfix 
even if I log in as user "s" and run this command:


spamassassin --prefs-file=/home/s/.spamassassin/user_prefs  -t < 
some_email


So I'm not sure what's going on.

Getting different SA scores when using -R argument with spamc

2021-04-06 Thread Steve Dondley


When I run spamc without -R option like this:

spamc -u some_user  < some_email

I get the following output:





This is a multi-part message in MIME format.




Content analysis details:   (5.2 points, 5.0 required)

 pts rule name  description
 -- 
--

 0.2 BAYES_999  BODY: Bayes spam probability is 99.9 to 100%
[score: 1.]
 3.5 BAYES_99   BODY: Bayes spam probability is 99 to 100%
[score: 1.]
 0.7 SPF_SOFTFAIL   SPF: sender does not match SPF record 
(softfail)
-0.7 RCVD_IN_DNSWL_LOW  RBL: Sender listed at 
https://www.dnswl.org/,

low trust
[69.252.207.38 listed in list.dnswl.org]
-0.0 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
[69.252.207.38 listed in wl.mailspike.net]
 1.6 DATE_IN_PAST_03_06 Date: is 3 to 6 hours before Received: date
 0.0 SPF_HELO_NONE  SPF: HELO does not publish an SPF Record
 0.0 HTML_IMAGE_RATIO_02BODY: HTML has a low ratio of text to image
area
 0.0 HTML_MESSAGE   BODY: HTML included in message
-0.1 DKIM_VALID_AU  Message has a valid DKIM or DK signature 
from

author's domain
-0.1 DKIM_VALID Message has at least one valid DKIM or DK 
signature
 0.1 DKIM_SIGNEDMessage has a DKIM or DK signature, not 
necessarily




===



However, when I run this command on the same email with the -R command 
to get the SA scores only like this:


spamc -R -u some_user  < some_email


I get this output:


===

2.6/5.0
Spam detection software, running on the system "email.dondley.com",
has NOT identified this incoming email as spam.  The original
message has been attached to this so you can view it or label
similar future email.  If you have any questions, see
the administrator of that system for details.

Content preview:  Spam detection software, running on the system 
"email.dondley.com",
   has identified this incoming email as possible spam. The original 
message

   has been attached to this so you can view it or label simi [...]

Content analysis details:   (2.6 points, 5.0 required)

 pts rule name  description
 -- 
--

 0.8 BAYES_50   BODY: Bayes spam probability is 40 to 60%
[score: 0.5000]
-0.0 NO_RELAYS  Informational: message was not relayed via 
SMTP

 0.2 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level
mail domains are different
 1.6 DATE_IN_PAST_03_06 Date: is 3 to 6 hours before Received: date
 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.0 HTML_IMAGE_RATIO_02BODY: HTML has a low ratio of text to image
area





Notice the scores are totally different. According to man page, -R says:

Just output the SpamAssassin report text to stdout, for all messages.  
See -r for details of the output format used.


So why are the scores different with and without the -R option?

DNSWL overriding bayes_99 and bayes_999 rules

2021-04-06 Thread Steve Dondley

I have emails that have been flagged as spam in the past but that are 
still getting through, presumably because the servers are on some DNSWL.


Example:

X-Spam-Status: No, score=0.9 required=5.0 tests=BAYES_99,BAYES_999,
DATE_IN_PAST_03_06,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,
HTML_IMAGE_RATIO_02,HTML_MESSAGE,RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H2,
SPF_HELO_NONE,SPF_SOFTFAIL shortcircuit=no autolearn=no
autolearn_force=no version=3.4.2

What's the recommended way to handle these? Do I turn on shortcircuit? 
Do I bump up the score for BAYES_99, BAYES_999? Or might there be a way 
to ignore DNSWL scores if they have a high bayes score?

What makes this email spam and how do I train myself to find markers for spam so I can train spamassassin properly?

2021-03-28 Thread Steve Dondley


The email below slipped through my spam filter.

It has malicious content attached which purports to be a voicemail from 
comcast (I've snipped the attachment from the example) but it is 
actually a phishing attack. The attachment contains a link that goes to 
a web page at an obscure domain that prompts you to log into your 
comcast account.


As you can see by the headers, this email was well-trusted by SA with a 
score of -2.7.


I don't think I can rely much on bayes filtering for these kinds of 
emails since the body has so little text (or do I make a bad assumption 
here?). And to my untrained eye, the only thing that looks suspicious is 
line 40 which says: "smtprelay.hostedemail.com".


So what's the giveaway that this is spam and what rule can I add to get 
SA to recognize it as such? And what is the best way for me to learn how 
to analyze the headers so I can recognize spam myself? Any good 
tutorials for this?




  1 Return-Path: 
  2 Delivered-To: catch...@example.org
  3 Received: from email.example.org
  4 by email.example.org with LMTP
  5 id EkqVDIVdYGCceQAAW5pcLQ
  6 (envelope-from 
)

  7 for ; Sun, 28 Mar 2021 06:42:13 -0400
  8 Received: by email.example.org (Postfix, from userid 115)
  9 id 2489422533; Sun, 28 Mar 2021 06:42:13 -0400 (EDT)
 10 Authentication-Results: email.example.org;
 11 dkim=pass (2048-bit key; secure) header.d=comcast.net 
header.i=@comcast.net header.b="PSvQlJTc";

 12 dkim-atps=neutral
 13 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on 
email.example.org

 14 X-Spam-Level:
 15 X-Spam-Status: No, score=-2.7 required=4.0 
tests=BAYES_50,DKIM_SIGNED,
 16 
DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,INVALID_MSGID,

 17 MSGID_FROM_MTA_HEADER,OBFU_TEXT_ATTACH,RCVD_IN_DNSWL_HI,
 18 RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS 
autolearn=unavailable

 19 autolearn_force=no version=3.4.2
 20 Received-SPF: Pass (mailfrom) identity=mailfrom; 
client-ip=96.114.154.164; helo=resqmta-po-05v.sys.comcast.net; 
envelope-from=x-flnltycomcastvoicemail_ref.no01...@comcast.net; 
receiver=
 21 Received: from resqmta-po-05v.sys.comcast.net 
(resqmta-po-05v.sys.comcast.net [96.114.154.164])
 22 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 
(256/256 bits))

 23 (No client certificate requested)
 24 by email.example.org (Postfix) with ESMTPS id F22E6215BD
 25 for ; Sun, 28 Mar 2021 06:42:11 -0400 
(EDT)

 26 Received: from resimta-po-42v.sys.comcast.net ([96.114.154.212])
 27 by resqmta-po-05v.sys.comcast.net with ESMTP
 28 id QSrxlUJdvoWleQSrxlMdfB; Sun, 28 Mar 2021 10:42:09 +
 29 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcast.net;
 30 s=20190202a; t=1616928129;
 31 bh=vkwV5ud3feChWZLQsYrnwAqC5q/gOtq5c2+sZwvKGUI=;
 32 
h=Received:Received:Message-ID:Received:Received:From:Subject:To:

 33  Content-Type:MIME-Version:Date;
 34 
b=PSvQlJTcBWsdJnqw5X2ghcFhFC/KDs9orh5uzVOpepDAf2rxUTc3bG03diY25hkLB
 35  
fKraMiHrMsG0UjujPtZPBZ10Wvs+b/pCliySBbDhG4hPak0kJwkoe8INCCabIiNkCc
 36  
8LcCU2x8x5mK0WrbPxGQatIXplKMnAjK7Tr/v27aGvxFxfBjkeDL7DrG6AHNvjtv+P
 37  
N8/WmgYIX2MldH9NM5DFb1OIsENAGdRT2SQnBW+t67wJ9JvIl6D8ZpAXLK0Ra8rrZw
 38  
GbL3gsz49PAoDxAJTuMpWnvmef6J7o/xwV98mMj9s0Dyk3Y+IF2xtoz6CVzDjK/nHy

 39  7YHOQjMWIrXJQ==
 40 Received: from smtprelay.hostedemail.com ([216.40.44.63])
 41 by resimta-po-42v.sys.comcast.net with ESMTP
 42 id QSrwlZX7FX3qEQSrwlyoxt; Sun, 28 Mar 2021 10:42:08 +
 43 X-Xfinity-VAAS: 
gggruggvucftvghtrhhoucdtuddrgeduledrudehiedgfeduucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuvehomhgtrghsthdqtfgvshhinecuuegrihhlohhuthemuceftddunecuogfntfdquehouhhnugdqtfefvdculdehmdenucfjughrpefhuffvtgggffesmhdttdertddttdenucfhrhhomhepfdgiqdfhlhfplhfvjggtohhmtggrshhtvhhoihgtvghmrghilhgprhgvfhdrnhhotddujfffufestghomhgtrghsthdrnhgvthdfuceoigdqhfhlpfhlvfgjtghomhgtrghsthhvohhitggvmhgrihhlpghrvghfrdhnohdtudfjfffusegtohhmtggrshhtrdhnvgh 
   
tqeenucggtffrrghtthgvrhhnpeduvddtkeduleehvdejkeeludfhhffghefhgeegjeefgeejveeiuedtgfeitdelieenucfkphepvdduiedrgedtrdeggedrieefpdeivddrudekvddrleelrdelgeenucevlhhushhtvghrufhiiigvpeefnecurfgrrhgrmhephhgvlhhopehsmhhtphhrvghlrgihrdhhohhsthgvuggvmhgrihhlrdgtohhmpdhinhgvthepvdduiedrgedtrdeggedrieefpdhmrghilhhfrhhomhepgidqfhhlnhhlthihtghomhgtrghsthhvohhitggvmhgrihhlpghrvghfrdhnohdtudhhughssegtohhmtggrshhtrdhnvghtpdhrtghpthhtohepihgsvgifgeehheestghomhgtrg 
   hsthdrnhgvthdprhgtphhtthhopehofhhfihgtvgesihgsvgifgeehhedrohhrgh

 44 X-Xfinity-VMeta: sc=5.00;st=legit
 45 X-Xfinity-Message-Heuristics: IPv6:N;TLS=1;SPF=4;DMARC=F
 46 Message-ID: 
qsrwlzx7fx3qeqsrwlyoxt.1616928128.bcb9cc98f861a2c7a8b119d18ed7fa74.missin...@comcast.net
 47 Received: from omf14.hostedemail.com (clb03-v110.bra.tucows.net 
[216.40.38.60])
 48 by smtprelay03.hostedemail.com (Postfix) with ESMTP id 
03D8F837F24D
 49

Why no points for SPF_NONE?

2021-03-21 Thread Steve Dondley

I'm learning a bit about spamassassin rules and taking a peek at how my 
inbound mail is scored. I noticed that PF_NONE scores zero points by 
default. I'm wondering if there is a good reason for not giving it a 
score and whether I should set that to something much higher like 1.0.


I'm curious to know what more experienced people have this set to. 
Thanks.

Re: Workflow for adding new ham/spam to existing site-wide database?

2021-03-16 Thread Steve Dondley

You covered a lot of ground here. Thanks.. If you have some spare 
cycles, I have follow up questions to get an understanding of how you 
process your email:



21 seconds at that includes fetch the samples via imap from two
folders, fire them against a bayes-only spamassasin instance,


What is a "bayes-only" instance? I don't follow. What other kinds of 
instances are there?



ignore

BAEYS_00/BAYES_99 messages, move the rest to the both training
folders, anonymize them, strip useless headers, fire sa-learn against


OK, so it looks like you are suggesting that emails get kind of 
pre-screened to determine if they are obvious spam or not.


And by anonymize, what do you mean? Remove the headers that contain 
email addresses? What other headers are useless? What exactly is the 
goal of anonymizing and removing the headers? I think I have a vague 
idea why but can't quite crystallize it in my head.



both folders, fire bogfilkter training against both folders and verify
that the new sampel files score with BEYS_99/BAYES_00 now


bogfilkter training?

So the goal is to get all the new emails to score either 99 (spam) or 00 
(ham).


So once I verify they score 00 or 99, do I then throw them on the larger 
collection of ham/spam with all headers restored? And what do I do if 
they still don't score 00 or 99?

Workflow for adding new ham/spam to existing site-wide database?

2021-03-16 Thread Steve Dondley

I have been accumulating spam/ham samples and sorting them out into 
different directories on my server. As new spam/ham comes in, I throw it 
into the existing pile and then run "sa-learn --spam|--ham" on the whole 
pile.


It dawned on me that this will get very slow as I eventually collect 
tens of thousand of emails. So I'm wondernig if it's better to:


1) Place all new, incoming spam/ham into empty directories
2) Run sa-learn only on these directories with small samples
3) Once done, move these new emails to an archive of spam/ham samples
4) Repeat

Is this typically how it's done?

Scoring for "look alike" characters in subject?

2021-03-15 Thread Steve Dondley

I'm noticing a fair amount of spam getting through using letters in the 
subject line that are outside the standard set of ASCII characters in an 
effort to bypass spam filters. For example, instead of a capital "R", 
there will be a letter that closely approximates a capital "R" but when 
you look closely at it, you'll see the bottom of the rounded part of the 
"R" never connects to the line running along the left side of the 
letter.


I don't want to use a rule that is too-restrictive (like maybe banning 
all non-standard ascii characters) but I also want to increase the 
likelihood of email using these tactics getting flagged as spam.


I'm new to spamasssassin so I'm not sure if a rule like this already 
exists or how I might go about finding this rule or what I should weight 
it. I'm wondering if others on the list have rules to address this same 
issue and can share their rule. Thanks.

Re: Can a .spamassassin directory in a user's home directory override the site-wide configuration?

2021-03-15 Thread Steve Dondley

OK, thanks for the additional info. It looks like I was having a 
permissions issue and the bayes_* files were not both r/w for users 
despite having bayes_file_mode set to 0666. I'm thinking probably 
because the bayes_path was originally created manually with root.


spamassassin reads site-wide config, then users' 
~/.spamassassin/user_prefs


spamd can do the same, if it runs under root without the '-x' flag 
(which

disables this behavior).

spamc connects to spamd passing the username to it, so you can override
current user by passing the "-u username" flag to it.

Can a .spamassassin directory in a user's home directory override the site-wide configuration?

2021-03-14 Thread Steve Dondley

I'm learning to understand how to properly set up a site-wide bayes 
database on my server. Thanks for everyone's help and patience so far.


I've discovered that the SA score assigned to a user's incoming email is 
different than the SA score run through the "spamc" or "spamassassin" 
command. For example, the SA headers for email "A" will show a score of 
only 1.4 (non-spam) in the user's inbox. It shows as non-span despite 
the fact that I have run it through sa-learn as spam. When I run the 
same email through "spamc -R < " on the command line as 
the same user that received the original message, I see a score of 6.8 
and it is properly getting classified as spam.


I'm trying to determine what accounts for the different scores and fix 
this problem so the correct score is assigned to mail coming into the 
user's inbox.



After doing some investigating, I discovered the user still had a 
.spamassassin directory in their home directory. The directory has only 
a single "user_prefs" file. But I'm wondering if the existence of this 
directory might cause spamassassin filter to ignore site-wide bayes 
database. If that's not the problem, what might account for the 
different scores and how might I fix the issue?

Re: How do I determine if user's email is being checked against the side-wide database?

2021-03-13 Thread Steve Dondley





Are there any BAYES hits on their messages, ham or spam? BAYES_{not
50} would be a positive confirmation. I'm not sure offhand if BAYES_50
hits when bayes is enabled but insufficiently trained...


In one email, I'm seeing this:

3.0 BAYES_95   BODY: Bayes spam probability is 95 to 99%

So I guess it's working. It looks like it got scored +3 points for 
having a greater than 95% probability of being spam according to the 
Bayes algorithm.

How do I determine if user's email is being checked against the side-wide database?

2021-03-13 Thread Steve Dondley

I *think* I now I have site-wide bayes filtering working now for all 
users on a server. I've edited /etc/spamassassin/local.cf to include 
"bayes_path" and "bayes_file_mode" and I don't see any errors about 
permissions being wrong from debian-spamd in mail.log.


But rather than guessing, I'm wondering if there is there a way I can 
objectively confirm that email for a particular user is getting checked 
against the site-wide bayes database. Thanks.

How do I efficiently share a database with all users?

2021-03-11 Thread Steve Dondley

I have a few different mail servers. I harvest mail from the servers and 
periodically sort them into ham/spam folders and then share the sorted 
mail back out to the servers and run sa-learn on each of the servers to 
coach spamassassin. After doing this a few days, I notice that stuff 
that I know I have classified as spam is still getting into inboxes. So 
clearly I'm doing something wrong. I did a little reading and discovered 
that sa-learn only applies for the user sa-learn is run under. It seems 
wasteful to run sa-learn over the same emails for every users on the 
system.


How can I run sa-learn once on the system and then share the generated 
database with each user?

Re: Training spamassassin past 5,000 emails

2021-03-09 Thread Steve Dondley


On 2021-03-09 08:28 AM, Greg Troxel wrote:

Steve Dondley  writes:


I've read through
https://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html which
states that "anything over about 5000 messages does not improve
accuracy significantly in our tests."


I would take that with a grain of salt.   Based on my experience 
running

SA for many years, I'd say that if you have new spam  that isn't like
the spam you already have, learning on it will help.

Also, I take it as a comment about "there's no need to try hard to get
more the 5K messages".  It doesn't say, "if you train on more than 5000
bad things will happen".


So once I hit 5,000, what do? Do I run --forget on say the 500 oldest
emails, delete those from my ham/spam folders and then add in a batch
of 500 newer ham/spam emails and then run sa-learn on all the emails
in my spam/ham folders?


I've been running sa-learn daily over my ham folders and my spam 
folders

for years.  I refile spam and ham so that it will be learned.  I find
the bayes scoring is quite good except for novel spam.  My bayes_* 
files

are about 83M in total.

So I don't think you necessarily have a problem to solve.


OK, thanks for the advice. Appreciated.

Training spamassassin past 5,000 emails

2021-03-09 Thread Steve Dondley

I've read through 
https://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html which 
states that "anything over about 5000 messages does not improve accuracy 
significantly in our tests."


So once I hit 5,000, what do? Do I run --forget on say the 500 oldest 
emails, delete those from my ham/spam folders and then add in a batch of 
500 newer ham/spam emails and then run sa-learn on all the emails in my 
spam/ham folders?

Re: Upgrading from 3.4.2 to 3.4.5, how to

2021-01-21 Thread Steve Charmer

on this documentation page:
https://cwiki.apache.org/confluence/display/SPAMASSASSIN/UpgradingVersion

"If you install using a Linux package installer:
 Debian unstable: apt-get install spamassassin
"
what is the meaning of "unstable" ?
it sounds scary, like the package should not be run in live mail systems,
but then why would it be in the package mgr and not in a development
package version?

Re: Upgrading from 3.4.2 to 3.4.5, how to

2021-01-21 Thread Steve Charmer

I'm sorry, but I do not understand your message.

I thought an upgrade fixes bugs. Maybe you are thinking about an update,
which seems like it would updates rules in *.samples?

I would "like" to backup everything, for safety, that is why I included a
list of the directories (fodlers) which I thought had Spamassassin content,
to get feedback from other users if they are the correct folders to backup,
in case a restore is needed. Unfortunately, I don't feel any more
knowledgeable than before I read your message. By "package databases" did
you mean the bayes database?

So if anyone can assist with more advice on "exactly" what to backup and
how to run an "upgrade", please chime in.
Thank you.

Re: Upgrading from 3.4.2 to 3.4.5, how to

2021-01-20 Thread Steve Charmer

are these the important folders which need to be backed up?

PREFIX=/usr,
DEF_RULES_DIR=/usr/share/spamassassin,
LOCAL_RULES_DIR=/etc/spamassassin,
LOCAL_STATE_DIR=/var/lib/spamassassin

and...
  /var/lib/spamassassin/3.004002
does that match to SA version 3.4.2 ?
I see 3.00... and think, NO that is not 3.4 ... etc


data gotten from running
sa-update -D
Jan 20 11:44:02.212 [30368] dbg: logger: adding facilities: all
Jan 20 11:44:02.213 [30368] dbg: logger: logging level is DBG
Jan 20 11:44:02.213 [30368] dbg: generic: SpamAssassin version 3.4.2
Jan 20 11:44:02.213 [30368] dbg: generic: Perl 5.022001, PREFIX=/usr,
DEF_RULES_DIR=/usr/share/spamassassin, LOCAL_RULES_DIR=/etc/spamassassin,
LOCAL_STATE_DIR=/var/lib/spamassassin
Jan 20 11:44:02.213 [30368] dbg: config: timing enabled
Jan 20 11:44:02.215 [30368] dbg: config: score set 0 chosen.
Jan 20 11:44:02.222 [30368] dbg: generic: sa-update version 3.4.2 /
svn1840377
Jan 20 11:44:02.222 [30368] dbg: generic: using update directory:
/var/lib/spamassassin/3.004002

>

Upgrading from 3.4.2 to 3.4.5, how to

2021-01-19 Thread Steve Charmer

Hi, I am running version 3.4.2

/usr/bin/spamassassin -V

SpamAssassin version 3.4.2

  running on Perl version 5.22.1

spamd --version

SpamAssassin Server version 3.4.2

  running on Perl 5.22.1

  with SSL support (IO::Socket::SSL 2.024)

  with zlib support (Compress::Zlib 2.068)

which spamd

/usr/sbin/spamd

==
SA was originally installed using apt-get (Ubuntu-16)


 ==
BACKUP
I would feel safer if I had a backup of the current spamassassin system.
is there a recommended method to backup all files to a tar?

I think I have identified these folders as worthy of backup:

# define an array of folders to backup
backup_dirs=( \
'/etc/spamassassin' \
'/usr/bin' \
'/usr/sbin' \
'/usr/share/perl5' \
'/usr/share/spamassassin' \
'/var/lib/spamassassin' \
)


and I can also backup the bayes database
sa-learn --backup > "${bayes_backup_filename}"

is there anything else you can recommend for backup
 prior to installing the newest version?

Is there a way to check timestamps on files inside spamassassin folders to
guess if any were modified to be different than the files installed at
installation time?
(it has been a few years & I don't really remember if I did anything bad to
those files when I first started learning about SA.)

==
UPGRADE
To upgrade to the latest version, 3.4.5
do I just run:

apt-get install spamassassin

does it simply overwrite existing files?
i.e. I don't need to uninstall the old version first?

Will my local.cf file stay the same?


Thank you in advance

Re: BITCOIN_PAY_ME and new type of blackmail, non porn.

2018-12-18 Thread Zinski, Steve

I’m seriously thinking about doing the same (block all emails that contain a 
bitcoin address). I’ve had good luck with my custom rule that also tests for 
Unicode obfuscation:

body__BTC1  /\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b/
body__BTC2  /\b\W*b\W*i\W*t\W*c\W*o\W*i\W*n\W*\b/i
body__BTC3  /\b\W*b\W*t\W*c\W*\b/i
body__BTC4  /bt[c\x{0441}]/i
body__BTC5  /b[i\x{0456}]t[c\x{0441}][o\x{043E}][i\x{0456}]n/i
metaLOCAL_BITCOIN   ( __BTC1 && ( __BTC2 || __BTC3 || __BTC4 || __BTC5 ) )
score   LOCAL_BITCOIN   10.0



From: Mark London 
Date: Tuesday, December 18, 2018 at 1:51 PM
To: "users@spamassassin.apache.org" 
Subject: Re: BITCOIN_PAY_ME and new type of blackmail, non porn.

However, I think the BITCOIN_PAY_ME rule need a bit of fine tuning, to catch 
other emails.  Like the one below, which escaped triggering the rule.   A 
constant battle between spam rules, and bad English grammar.

Maybe I should say the hell with it, and simply block any email sent to me, 
with a bitcoin address in it. :)  - Mark

Re: Bitcoin update

2018-10-07 Thread Zinski, Steve

> The trouble with this is that you would be adding 10 point to anything
> with a bitcoin address whether anything's obfuscated or not. If you want
> to avoid this take a look at the FUZZY_* rules.


Well, actually, no. I sent you a snippet of my rule and inflated the score to 
10 for those of you who wanted to detect emails with obfuscated (Unicode) 
bitcoin addresses within.

I use the following rules to block the sextortion emails that are so rampant 
right now. As you can see, it assigns a 0.1 score to the bitcoin portion, then 
the following rule uses that to test for sextortion emails (also obfuscated 
with Unicode characters). These two rules work great for me in stopping the 
vast majority of sextortion emails coming to our campus.

body__BTC1  /\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b/
body__BTC2  /\b\W*b\W*i\W*t\W*c\W*o\W*i\W*n\W*\b/i
body__BTC3  /\b\W*b\W*t\W*c\W*\b/i
body__BTC4  /\bb[i\x{0456}]t[c\x{0441}][o\x{043E}][i\x{0456}]n\b/i
metaLOCAL_BITCOIN   ( __BTC1 && ( __BTC2 || __BTC3 || __BTC4 ) )
score   LOCAL_BITCOIN   0.1

body__UCporn/\b\W*p\W*o\W*r\W*n\W*\b/
body__UCpixel   /\b\W*p\W*i\W*x\W*e\W*l\W*\b/
body__UCvideos  /\b\W*v\W*i\W*d\W*(e\W*o\W*)?(s)?\W*\b/
body__UCwebcam  /\b\W*(w\W*e\W*b\W*)?c\W*a\W*m\W*(e\W*r\W*a)?\W*\b/
body__UCkeylogger   /\b\W*k\W*e\W*y\W*l\W*o\W*g\W*g\W*e\W*r\W*\b/
body__UCviruses /\b\W*v\W*i\W*r\W*u\W*s\W*(e\W*s)?\W*\b/
body__UCmalware /\b\W*m\W*a\W*l\W*w\W*a\W*r\W*e\W*\b/
body__UCtrojan  /\b\W*t\W*r\W*o\W*j\W*a\W*n\W*\b/
body__UCrecording   /\b\W*r\W*e\W*c\W*o\W*r\W*d\W*i\W*n\W*g\W*\b/
body__UChacked  /\b\W*h\W*a\W*c\W*k\W*e\W*d\W*\b/
metaLOCAL_SEXTORTION ( LOCAL_BITCOIN && ( __UCporn || __UCpixel || 
__UCvideos || __UCwebcam) && ( __UCkeylogger || __UCviruses || __UCmalware || 
__UCtrojan || __UCrecording || __UChacked ) )
score   LOCAL_SEXTORTION20.0

The gist of the SEXTORTION rule is the email must contain a bitcoin address AND 
(porn or pixel or video/videos or webcam/camera/cam) AND (keylogger or 
virus/viruses or malware or trojan or recording or hacked). Every sextortion 
email that I've seen contains those words.

It's not pretty, but it works (until the scammers change tactics).

Re: Bitcoin update

2018-10-05 Thread Zinski, Steve

Yes, absolutely.


On 10/5/18, 1:42 PM, "John Hardin"  wrote:

On Fri, 5 Oct 2018, Zinski, Steve wrote:

> Here's how I'm blocking bitcoin emails with Unicode characters embedded:
>
> body__BTC1  /\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b/
> body__BTC2  /\b\W*b\W*i\W*t\W*c\W*o\W*i\W*n\W*\b/i
> body__BTC3  /\b\W*b\W*t\W*c\W*\b/i
> body__BTC4  
/\bb[i\x{0456}]t[c\x{0441}][o\x{043E}][i\x{0456}]n\b/i
> metaLOCAL_BITCOIN   ( __BTC1 && ( __BTC2 || __BTC3 || __BTC4 ) )
> score   LOCAL_BITCOIN   10.0
>
> Works like a charm in my environment.

To clarify: I added a rule for general obfuscation using the zero-width 
Unicode glyph. It's not bitcoin-specific.

With your permission I can add that to my sandbox and see how it does in 
masscheck.

> On 10/5/18, 10:54 AM, "John Hardin"  wrote:
>
>On Fri, 5 Oct 2018, Pedro David Marco wrote:
>
>>   >On Thursday, October 4, 2018, 9:08:10 PM GMT+2, Kevin A. McGrail 
 wrote:
>> >Interesting.  Any chance for an unmodified pastebin spample?
>>
>> Yes please Joseph... any  change for it, please?  We are hungry...
>
>Test rule checked into my sandbox last night...
>
>Initial results aren't too promising.

-- 
  John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
  jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
   It is not the place of government to make right every tragedy and
   woe that befalls every resident of the nation.
---
  554 days since the first commercial re-flight of an orbital booster 
(SpaceX)

Re: Bitcoin update

2018-10-05 Thread Zinski, Steve

Here's how I'm blocking bitcoin emails with Unicode characters embedded:

body__BTC1  /\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b/
body__BTC2  /\b\W*b\W*i\W*t\W*c\W*o\W*i\W*n\W*\b/i
body__BTC3  /\b\W*b\W*t\W*c\W*\b/i
body__BTC4  /\bb[i\x{0456}]t[c\x{0441}][o\x{043E}][i\x{0456}]n\b/i
metaLOCAL_BITCOIN   ( __BTC1 && ( __BTC2 || __BTC3 || __BTC4 ) )
score   LOCAL_BITCOIN   10.0

Works like a charm in my environment.



On 10/5/18, 10:54 AM, "John Hardin"  wrote:

On Fri, 5 Oct 2018, Pedro David Marco wrote:

>   >On Thursday, October 4, 2018, 9:08:10 PM GMT+2, Kevin A. McGrail 
 wrote:
> >Interesting.  Any chance for an unmodified pastebin spample?
>
> Yes please Joseph... any  change for it, please?  We are hungry... 

Test rule checked into my sandbox last night...

Initial results aren't too promising.

-- 
  John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
  jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  554 days since the first commercial re-flight of an orbital booster 
(SpaceX)

Re: Using UTF-8 characters to avoid spam filter rules.

2018-06-28 Thread Zinski, Steve

I see that a lot in sextortion emails. So far, I’ve seen the word “bitcoin” 
encoded (obfuscated) the following ways:

bitc%D0%BEin
bit%D1%81oin
bit%D1%81%D0%BEin

And the word “wallet” as:

w%D0%B0ll%D0%B5t

These sextortion scammers are clever. So, instead of filtering on the word 
“bitcoin”, I now filter on a bitcoin regex (see below) and some other words 
such as “pixel”, “virus”, etc. which are always a part of the sextortion 
message.

body  __BITCOIN  /\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b/

Steve




From: Mark London 
Date: Thursday, June 28, 2018 at 2:26 PM
To: "users@spamassassin.apache.org" 
Subject: Re: Using UTF-8 characters to avoid spam filter rules.

On 6/28/2018 1:46 PM, 
users-digest-h...@spamassassin.apache.org<mailto:users-digest-h...@spamassassin.apache.org>
 wrote:

Subject:
Re: Using UTF-8 characters to avoid spam filter rules.

From:
RW <mailto:rwmailli...@googlemail.com>

Date:
6/26/2018 12:12 PM


To:
users@spamassassin.apache.org<mailto:users@spamassassin.apache.org>



On Tue, 26 Jun 2018 00:33:11 -0400

Mark London wrote:



Hi - Some of the words in the spam email below, are using UTF-8

characters, to avoid spam detection.  I.e. the phrase "bitcoin wallet

address", are not the simple ASCII characters that they appear to be.



View the source of my email, to understand what I'm talking about. Is

there any rule I canu se, to detect messages that are mostly plain

ASCII characters, but are using enough UTF-8 characters, that

obviously have been put in to avoid spam rules?

You can test for specific obfuscated words like this:



bodyFUZZY_BITCOIN   /(?!itcoin)/i

replace_rules   FUZZY_BITCOIN





For anything more general you'd have to match on lookalike characters

from non-roman codepages embedded in ASCII (or roman) words. Finding

Accented characters or general multibyte UTF-8 is not particularly

suspicious.

Thanks for the info.   I had never come across this issue before, and was 
afraid that more spammer would start doing it.

In which case, I would think that if a plain text message contained a lot of 
"suspicious" multibyte UTF-8 characters embedded into roman characters words , 
that this would make it suspicious enough to flag.   However, for now, this 
spam message was the only one I've seen like that. So I won't worry about it 
for now.

- Mark

Fwd: Increase scores based on lewd body text

2018-05-03 Thread Steve Mallett

Didn't cc users@

How do I add a non sa-compile ruleset to spamassassin? The googles are not
helping.

on Ubuntu16

Steve

On Tue, May 1, 2018 at 7:52 PM, Kevin A. McGrail <kmcgr...@apache.org>
wrote:

> I have several rules for sexually explicit content in KAM.cf.  See
> https://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf
>
> --
> Kevin A. McGrail
> Asst. Treasurer & VP Fundraising, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>
> On Tue, May 1, 2018 at 6:42 PM, Steve Mallett <s...@iioo.co> wrote:
>
>>
>> Hi,
>>
>> I have mboxs I'm running spamassassin against & many emails with very
>> lewd body text have the same scores as other emails without.
>>
>> I'm invoking via: formail -s procmail ~/procmail.rc < mbox
>>
>> SA V: 3.4.1
>>
>> Running on Ubuntu 16.04LTS
>>
>> How can increase the scores on bad words in body text and/or is there a
>> recipe specifically for that type of thing?
>>
>>
>> Steve
>>
>
>

Increase scores based on lewd body text

2018-05-01 Thread Steve Mallett

Hi,

I have mboxs I'm running spamassassin against & many emails with very lewd
body text have the same scores as other emails without.

I'm invoking via: formail -s procmail ~/procmail.rc < mbox

SA V: 3.4.1

Running on Ubuntu 16.04LTS

How can increase the scores on bad words in body text and/or is there a
recipe specifically for that type of thing?


Steve

Re: new campaign: bitly & appengine.google

2017-09-12 Thread Zinski, Steve

Report to – supp...@bitly.com



On 9/12/17, 1:29 PM, "Benny Pedersen"  wrote:

Chip M. skrev den 2017-09-12 15:28:
> 
> Does anyone have a contact at BitLy?  These would be trivially
> easy for them to block.


https://support.bitly.com/hc/en-us/articles/231247908-I-ve-found-a-bitlink-that-directs-to-spam-what-should-I-do-

googled bit.ly report spam

Re: Custom rule problem

2017-01-31 Thread Zinski, Steve

Sorry for the trouble, everyone… I had been forwarding the spam through my 
personal IMAP account (to test my rule) which was apparently blocking it. I 
forwarded it using my gmail account and my new rule fired. I feel like an idiot.

Steve



On 1/31/17, 2:53 PM, "John Hardin" <jhar...@impsec.org> wrote:

On Tue, 31 Jan 2017, Zinski, Steve wrote:

> Here’s the “view source” of the message in question.
>
> http://pastebin.com/AnwkAf9t
>
> Again, it’s line 88 that I’m trying to match.

...let's try this again...

A uri rule hits that here:

Jan 31 09:21:07.423 [21842] dbg: rules: ran uri rule __ALL_URI ==> got 
hit: 
"http://trc.spam_domain_redacted.com/redirect.php?email=redac...@uronline.net;

It also hits an existing rule:

Jan 31 09:21:07.525 [21842] dbg: rules: ran rawbody rule __BUGGED_IMG 
==> got hit: "http://trc.spam_domain_redacted.com/redirect.php?email=re;


> On 1/31/17, 11:36 AM, "John Hardin" <jhar...@impsec.org> wrote:
>
>On Tue, 31 Jan 2017, Zinski, Steve wrote:
>
>> I’m trying to write a custom rule to block a certain type of spam. 
When I view the message source, the very last lines of the spam look like this:
>>
>> 
>> http://trc.spammersdomain.com/redirect.php?email=redac...@richmond.edu;>
>> 
>> 
>>
>> Every single rule that I’ve written fails to detect that 
redirect.php URI. I’ve even tried a rule that simply reads:
>>
>> Full  my_rule /redirect/is
>> Score  my_rule 10.0
>>
>> No match. I’ve tried full, rawbody, uri, and body, all to no avail. 
I’ve even shortened the search string to “redi” (it’s a unique word) and still 
no match. I’ve been writing rules for many years and this is the first time 
I’ve seen this behavior. Any ideas?
>
>If you have a rule dev environment (vs. testing rules in your live
>install) I've found something like this to be really useful:
>
>   uri __ALL_URI   /.*/
>   tflags  __ALL_URI   multiple
>
>Then all the detected URIs appear in the rule hits debug output.
>
>Post the full email on Pastebin or similar, we can't meaningfully 
comment
>on what you provided beyond "uri *should* work for that".

-- 
  John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
  jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Tomorrow: the 14th anniversary of the loss of STS-107 Columbia

Re: Custom rule problem

2017-01-31 Thread Zinski, Steve

Here’s the “view source” of the message in question.

http://pastebin.com/AnwkAf9t

Again, it’s line 88 that I’m trying to match.

Thanks.

On 1/31/17, 11:36 AM, "John Hardin" <jhar...@impsec.org> wrote:

On Tue, 31 Jan 2017, Zinski, Steve wrote:

> I’m trying to write a custom rule to block a certain type of spam. When I 
view the message source, the very last lines of the spam look like this:
>
> 
> http://trc.spammersdomain.com/redirect.php?email=redac...@richmond.edu;>
> 
> 
>
> Every single rule that I’ve written fails to detect that redirect.php 
URI. I’ve even tried a rule that simply reads:
>
> Full  my_rule /redirect/is
> Score  my_rule 10.0
>
> No match. I’ve tried full, rawbody, uri, and body, all to no avail. I’ve 
even shortened the search string to “redi” (it’s a unique word) and still no 
match. I’ve been writing rules for many years and this is the first time I’ve 
seen this behavior. Any ideas?

If you have a rule dev environment (vs. testing rules in your live 
install) I've found something like this to be really useful:

uri __ALL_URI   /.*/
tflags  __ALL_URI   multiple

Then all the detected URIs appear in the rule hits debug output.

Post the full email on Pastebin or similar, we can't meaningfully comment 
on what you provided beyond "uri *should* work for that".

-- 
  John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
  jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
   The promise of nuclear power: electricity too cheap to meter
   The reality of nuclear power: FUD too cheap to meter
---
  Tomorrow: the 14th anniversary of the loss of STS-107 Columbia

Custom rule problem

2017-01-31 Thread Zinski, Steve

Hello, I have a problem that I hope someone can help me with.

I’m trying to write a custom rule to block a certain type of spam. When I view 
the message source, the very last lines of the spam look like this:


http://trc.spammersdomain.com/redirect.php?email=redac...@richmond.edu;>



Every single rule that I’ve written fails to detect that redirect.php URI. I’ve 
even tried a rule that simply reads:

Full  my_rule /redirect/is
Score  my_rule 10.0

No match. I’ve tried full, rawbody, uri, and body, all to no avail. I’ve even 
shortened the search string to “redi” (it’s a unique word) and still no match. 
I’ve been writing rules for many years and this is the first time I’ve seen 
this behavior. Any ideas?

Re: RCVD_IN_SORBS_SPAM and google IPs

2016-09-08 Thread Zinski, Steve

I’m seeing the same thing here, I’ve had to adjust that score lower. Also 
seeing lots of RCVD_IN_SORBS_WEB false-positives.


On 9/8/16, 4:53 PM, "Shane Williams"  wrote:

Hey all,

I'm seeing google IP ranges hit the RCVD_IN_SORBS_SPAM rule, and in
digging deeper, I realize that there are zero hits on this rule for
the two weeks prior to Aug. 31, and now I'm seeing it thousands of
times per week (not just against google IPs).

Was this rule added/changed/re-scored in a recent sa-update?  I looked
at ruleqa.spamassassin.org, and just at a glance notice that the rule
doesn't seem to be in commits previous to Aug. 30, but I may totally
be reading the site's information wrong.

I've turned the score down to a tiny, but non-zero value for now,
because it seems to be pushing legit emails close (if not over) the
local threshold.

-- 
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew

Re: Spamassassin Bayes... "why give that spam that score???"

2016-02-24 Thread Steve


On 24/02/2016 22:59, John Hardin wrote:

On Wed, 24 Feb 2016, Steve wrote:

I've used spamassassin for many years - on Ubuntu, using amvisd - 
with great success.  In recent months, I've been receiving several 
spam messages each day that evade the filters.


Can you provide samples? (e.g. three or four on Pastebin)


One of each of the most common forms:

http://pastebin.com/Wk2KD1Q1
http://pastebin.com/QCQ9Ymw7
http://pastebin.com/wgkmiJLt

I note that they tend to come from different mail servers each time - 
the URLs in the body tend to be unique, too.




* The false positives all match BAYES_00 - attracting a default score 
of -1.9. BAYES_00 seems to be at the crux of the misclassification.


Is there a way to delve into why these messages have been allocated 
such a low bayes score - while (to a human) appearing blatant, 
simple, spam on "vanilla" spam topics?  Has my bayes data been 
"poisoned" somehow?


Poisoning is less likely than mistraining.
How large is your userbase and mail volume?


One user - me - several email addresses.  10,000 mails per month - 
several mailing lists where I read only a tiny fraction of the posts.  
~1,500 spams (that survive mail server RBLs).  Autolearn is on - I don't 
think about it, it is automatic. :)


How do you train your Bayes? Autolearn? General user submissions? 
Trusted user submissions? Only you, from only your personal mail?
Only my personal mailbox *really* matters to me.  I train from it using 
the dovecot antispam plugin... which feeds mail I shift to/from a spam 
folder through a pipe involving "spamc -C".


Do you keep base training corpora so you can wipe and retrain if it 
goes off the rails for some reason?
(In principle) I've got multi-gigabyte-scale spam/ham corpora.  I'm yet 
to [ever] do anything with it. :)


It is worth noting that I get a lot of correctly identified spam - 
and much of that matches BAYES_99 and BAYES_999... and my ham gets 
BATES_00... so, for many messages, bayes is working. Is it likely 
that I am suffering poor performance (for these specific messages) as 
a result of some tunable parameter?


Probably not. There's not a lot to tune in Bayes. It's pretty much 
solely dependent on what you've trained it with.



What is the most effective way to tackle this?


If all the FNs are getting BAYES_00, make sure you're (re)training 
them as spam.
I believe I'm doing that - but it isn't easy to prove that the training 
'worked'.


Review how you're training. If your users aren't really trustworthy 
you should be manually reviewing submissions.


When spam  arrives in my primary inbox, I hand classify - I'm less 
obsessive about mailing lists. Dovecot initiates training automatically 
when I shift messages to a special spam folder.


I feel autolearn can be problematic, particularly if things are 
already going off the rails.


I expect Autolearn (assisted by Razor, Pyzor and DCC) has done the vast 
majority of my training.  This year, I've hand-trained 216 
false-negatives and 0 false positives.


If you have base training corpora, review it for misclassifications 
(FNs), wipe and retrain.


I guess I could do that... My expectation is that - if I train with the 
corpora I can pick easily (without changing configuration) I'll get the 
same bayes database I currently have... which will give the same 
scores.  Really, I'd like to understand why my current bayes database 
makes the classifications it does.

Spamassassin Bayes... "why give that spam that score???"

2016-02-24 Thread Steve

I've used spamassassin for many years - on Ubuntu, using amvisd - with 
great success.  In recent months, I've been receiving several spam 
messages each day that evade the filters.


* These false-negatives conform to a handful of simple, formulaic, 
textual forms - on common subjects.
* The emails consist fairly plain HTML and appear not to employ any 
significant obfuscation.
* I have tried to train spamassassin with many of these spam samples - 
without any effect.
* The bayes database is updated. The bayes_journal (37k), bayes_seen 
(5.2mb) and bayes_toks (5.4mb) files all have recent timestamps.
* The false positives all match BAYES_00 - attracting a default score of 
-1.9. BAYES_00 seems to be at the crux of the misclassification.


Is there a way to delve into why these messages have been allocated such 
a low bayes score - while (to a human) appearing blatant, simple, spam 
on "vanilla" spam topics?  Has my bayes data been "poisoned" somehow?  
It is worth noting that I get a lot of correctly identified spam - and 
much of that matches BAYES_99 and BAYES_999... and my ham gets 
BATES_00... so, for many messages, bayes is working. Is it likely that I 
am suffering poor performance (for these specific messages) as a result 
of some tunable parameter?


What is the most effective way to tackle this?

A rule to check X-ASN header

2015-11-23 Thread steve

bg: async: calling callback on key 
asnlookup-0-asn.routeviews.org
Nov 23 11:54:12.380 [74846] dbg: asn: asn.routeviews.org: lookup result packet: 
50.82.125.74.asn.routeviews.org. 83431 IN TXT 15169 74.125.0.0 16
Nov 23 11:54:12.380 [74846] dbg: asn: ASNCIDRROUTEVIEWS added route 
74.125.0.0/16
Nov 23 11:54:12.380 [74846] dbg: asn: ASNROUTEVIEWS added asn 15169
Nov 23 11:54:12.380 [74846] dbg: check: tagrun - tag ASNROUTEVIEWS is now 
ready, value: AS15169
Nov 23 11:54:12.380 [74846] dbg: check: tagrun - tag ASNCIDRROUTEVIEWS is now 
ready, value: 74.125.0.0/16
Nov 23 11:54:12.381 [74846] dbg: async: calling callback on key 
asnlookup-2-ip2asn.sasm4.net
Nov 23 11:54:12.381 [74846] dbg: asn: ip2asn.sasm4.net: lookup result packet: 
50.82.125.74.ip2asn.sasm4.net. 631 IN TXT AS15169
Nov 23 11:54:12.381 [74846] dbg: asn: ASNSASM4 added asn 15169
Nov 23 11:54:12.381 [74846] dbg: check: tagrun - tag ASNSASM4 is now ready, 
value: AS15169
Nov 23 11:54:12.383 [74846] dbg: async: calling callback on key 
asnlookup-3-origin.asn.spameatingmonkey.net
Nov 23 11:54:12.383 [74846] dbg: asn: origin.asn.spameatingmonkey.net: lookup 
result packet: 50.82.125.74.origin.asn.spameatingmonkey.net. 221 IN TXT (
Nov 23 11:54:12.383 [74846] dbg: asn: [...] "74.125.0.0/16 | AS15169 | Google 
Inc. | 2000-03-30 | US" )
Nov 23 11:54:12.383 [74846] dbg: asn: ASNCIDRSEM added route 74.125.0.0/16
Nov 23 11:54:12.383 [74846] dbg: asn: ASNSEM added asn 15169
Nov 23 11:54:12.383 [74846] dbg: check: tagrun - tag ASNSEM is now ready, 
value: AS15169
Nov 23 11:54:12.383 [74846] dbg: check: tagrun - tag ASNCIDRSEM is now ready, 
value: 74.125.0.0/16
Nov 23 11:54:12.397 [74846] dbg: async: completed in 0.034 s: TXT, 
asnlookup-3-origin.asn.spameatingmonkey.net
Nov 23 11:54:12.398 [74846] dbg: async: completed in 0.027 s: TXT, 
asnlookup-0-asn.routeviews.org
Nov 23 11:54:12.398 [74846] dbg: async: completed in 0.034 s: TXT, 
asnlookup-2-ip2asn.sasm4.net
Nov 23 11:54:13.574 [74846] dbg: rules: ran header rule T_SCS_ASN_EXISTS 
==> got hit: ""
Nov 23 11:54:13.574 [74846] dbg: rules: ran header rule T_SCS_ASN_ANYTHING 
==> got hit: "15169"
Nov 23 11:54:14.396 [74846] dbg: async: timing: 0.027 . 
asnlookup-0-asn.routeviews.org
Nov 23 11:54:14.397 [74846] dbg: async: timing: 0.034 . 
asnlookup-2-ip2asn.sasm4.net
Nov 23 11:54:14.397 [74846] dbg: async: timing: 0.034 . 
asnlookup-3-origin.asn.spameatingmonkey.net
Nov 23 11:54:14.571 [74846] dbg: check: 
tests=BAYES_00,DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,FREEMAIL_FROM,HTML_MESSAGE,NML_ADSP_CUSTOM_MED,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_PASS,TXREP,T_DKIM_INVALID,T_SCS_ASN_ANYTHING,T_SCS_ASN_EXISTS
X-Spam-ASN: AS15169 74.125.0.0/16
T_SCS_ASN_ANYTHING,T_SCS_ASN_EXISTS shortcircuit=no autolearn=no
X-Spam-ASN_RV: AS15169 74.125.0.0/16
X-Spam-ASN_SASM4: AS15169
X-Spam-ASN_SEM: AS15169 74.125.0.0/16

SPF_PASS=-0.001,TXREP=-1.021,T_DKIM_INVALID=0.01,T_SCS_ASN_ANYTHING=0.01, 
T_SCS_ASN_EXISTS=0.01

Any advice gratefully received!

Steve

Re-2: A rule to check X-ASN header

2015-11-23 Thread steve



Hi Benny,

>> asn plugin currently does not work with ipv6
I'll cross that bridge when I come to it.

> and if you see mails pretending sent from google/gmail it wont be dkim 
> pass and spf pass
 
The example i saw last week was from "Google Audit" 
<sec...@googletechteam.co.uk>, was DKIM signed and valid [but obviously not by 
Google's key :)] and was asking a user to verifiy thier account... URIs weren't 
blacklisted at the time.

Test results of that scan were...

DKIM_SIGNED=0.1
DKIM_VALID=-0.1
DKIM_VALID_AU=-0.1
HTML_MESSAGE=0.001
KAM_COUK=0.1
MIME_HTML_ONLY=0.723
RP_MATCHES_RCVD=-0.582
SPF_PASS=-0.001
TXREP=1.105

My thought process was that emails with Google in the Senders Name or email 
address should only really originate from IP addresses / ASN's Google own 
(initial invesgation suggest gmail.com comes from AS15169 thought I've not 
thrown a wide net yet).

> asn is nice but too unstable to make rules on
I feel its worth exploring for my purposes.

Any further advice will be grafefully recived.

Regards

Steve

 Original Message 
Subject: Re: A rule to check X-ASN header (23-Nov-2015 12:13)
From:Benny Pedersen <m...@junc.eu>
To:  st...@mailinglists.spectrumcs.net

> steve skrev den 2015-11-23 13:05:
> 
> > Any advice gratefully received!
> 
> asn plugin currently does not work with ipv6
> 
> and if you see mails pretending sent from google/gmail it wont be dkim 
> pass and spf pass
> 
> asn is nice but too unstable to make rules on
> 
> To: users@spamassassin.apache.org

Re-4: A rule to check X-ASN header

2015-11-23 Thread steve



> steve skrev den 2015-11-23 13:31:
> 
> >>> asn plugin currently does not work with ipv6
> > I'll cross that bridge when I come to it.
> 
> i just still need self to debug why it fails, currently i have seen 
> 2.0.0.0/8 when ipv6 recieved in 26xx: :=)
> 
> >> and if you see mails pretending sent from google/gmail it wont be dkim
> >> pass and spf pass
> > 
> > The example i saw last week was from "Google Audit"
> > <sec...@googletechteam.co.uk>, was DKIM signed and valid [but
> > obviously not by Google's key :)] and was asking a user to verifiy
> > thier account... URIs weren't blacklisted at the time.
> 
> co.uk is a domain and a tld, very cool :)
> 
> dont blame me on that
> 
> i can make google.junc.eu is it now google that spams you ?

That was just one example I received. Yes, you can very well use google.junc.en 
and no that doesn't mean Google spams me.

My eventual goal is to test for "Has google in the sender name OR domain" and 
"is NOT from a ASN owned by Google".

https://www.ultratools.com/tools/asnInfoResult?domainName=Google

Am I'm not explaining myself correctly?

> 
> yes i know co.uk is a valid tld, but spammers seems not knowing why not 
> to use it
> 
> > Test results of that scan were...
> > 
> > DKIM_SIGNED=0.1
> > DKIM_VALID=-0.1
> > DKIM_VALID_AU=-0.1
> > HTML_MESSAGE=0.001
> > KAM_COUK=0.1
> > MIME_HTML_ONLY=0.723
> > RP_MATCHES_RCVD=-0.582
> > SPF_PASS=-0.001
> > TXREP=1.105
> 
> what dkim domain, whois dkim-domain

It was DKIM signed by the senders domain googletechteam.co.uk.

> 
> > My thought process was that emails with Google in the Senders Name or
> > email address should only really originate from IP addresses / ASN's
> > Google own (initial invesgation suggest gmail.com comes from AS15169
> > thought I've not thrown a wide net yet).
> > 
> >> asn is nice but too unstable to make rules on
> > I feel its worth exploring for my purposes.
> 
> okay with me if you do with stable data
Thanks
> 
> > Any further advice will be grafefully recived.
> 
> possible start using dmarc ?
Not sure how that would help in the situation I've outlined?

Overall, while i appericate your efforts and discussions about the validatility 
of my objectives, what I'm really after is how can I query the X-ASN header?

If this turns out to be a waste of time I'll be the first to let you know.

Many thanks

Steve
> 
> To: users@spamassassin.apache.org

Re-4: A rule to check X-ASN header

2015-11-23 Thread steve



> > My thought process was that emails with Google in the Senders Name or
> > email address should only really originate from IP addresses / ASN's
> > Google own (initial invesgation suggest gmail.com comes from AS15169
> > thought I've not thrown a wide net yet).
> 
> a meta rule with rcvd header and From: header rules will do the trick, 
> faster and simpler.
> 
Good thinking. I'll investigate this futher.

Thanks

Steve

Re-2: A rule to check X-ASN header

2015-11-23 Thread steve



> > The example i saw last week was from "Google Audit" <secure@googletechteam.
> > co.uk>, was DKIM signed and valid [but obviously not by Google's key :)] 
> > and was asking a user to verifiy thier account... URIs weren't blacklisted 
> > at the time.
> > My thought process was that emails with Google in the Senders Name or email 
> > address should only really originate from IP addresses / ASN's Google own (
> > initial invesgation suggest gmail.com comes from AS15169 thought I've not 
> > thrown a wide net yet).
> 
> how do you come to that strange conclusion?
> 
> that is a domain as any other and "with Google in the Senders Name or 
> email address should only really originate" is by all respect pure 
> nonsense - DKIM, SPF and DMARC are about *domains* and not *partly 
> matches* of some special handeled large companies
> _
> 
>  Domain name:
>  googletechteam.co.uk
> 
>  Registrant:
>  Alexander Duffus
> 
>  Registrant type:
>  UK Individual
> 
>  Registrant's address:
>  Bury House
>  Royston
>  Hertfordshire
>  SG8 8QB
>  United Kingdom
> 

In my mailflow I believe it to be very unusual for a domain / sender to have 
Google in it and not orignate from Googles network.


The example I gave originated from 217.199.161.224 (ASN 20738 - Webfusion 
Internet Solutions) and had *google* in the domain, to me that's something I 
want to have visability of.

Overall, while i appericate your efforts and discussions about the validatility 
of my objectives, what I'm really after is how can I query the X-ASN header? 

If this turns out to be a waste of time I'll be the first to let you know. 

Many thanks 

Steve

Re-2: A rule to check X-ASN header

2015-11-23 Thread steve



> > Hi all,
> > 
> > I'm trying to create a rule which will check the results of the ASN
> > plugin. 
> ...
> > As a test I have the following...
> > 
> > ifplugin Mail::SpamAssassin::Plugin::ASN
> >header  T_SCS_ASN_EXISTS  exists:X-ASN
> >header  T_SCS_ASN_ANYTHINGX-ASN =~ /.*/i
> >header  T_SCS_ASN_ANY_AS  X-ASN =~ /AS[0-9]*/i
> >header  T_SCS_ASN_AS15169 X-ASN =~ /AS15169/
> >header  T_SCS_ASN_AS15169BX-ASN =~ /^AS15169 /
> > endif
> > 
> > On a test message which I sent myself on Friday from my google
> > account and which I am now currently pipping into SpamAssassin at the
> > command line the rules T_SCS_ASN_EXISTS and T_SCS_ASN_ANYTHING
> > trigger but T_SCS_ASN_ANY_AS, T_SCS_ASN_AS15169 and
> > T_SCS_ASN_AS15169B.
> ...
> > rules: ran header rule T_SCS_ASN_ANYTHING ==> got hit: "15169"
> 
> This is why the other tests fail, there's no "AS" before the number.
> 

RW,

Thank you! With that in mind I've made the following adjustment and the rule is 
now being triggered.

header  T_SCS_ASN_AS15169CX-ASN =~ /^15169$/

As to whether this will be helpful in detecting spam I'll let you know.

Kind regards

Steve

Re-6: A rule to check X-ASN header

2015-11-23 Thread steve



> steve skrev den 2015-11-23 15:43:
> 
> > That was just one example I received. Yes, you can very well use
> > google.junc.en and no that doesn't mean Google spams me.
> > 
> > My eventual goal is to test for "Has google in the sender name OR
> > domain" and "is NOT from a ASN owned by Google".
> > 
> > https://www.ultratools.com/tools/asnInfoResult?domainName=Google
> > 
> > Am I'm not explaining myself correctly?
> 
> you assume that ALL ips in there asn is used for outbound emailing ?

At the moment I do not know so I'd like my rule to rule to run for a bit so I 
can get a clear picture of what is going on.
 
> https://dmarcian.com/spf-survey/gmail.com see the flatted ip ranges and 
> compare it

This is useful information. I'll look into this as well as pursuing my ASN 
angle.

While i appericate your efforts and discussions about the validatility of my 
objectives, and at the risk of repeating myself what I'm really after is how 
can I query the X-ASN header? 

If this turns out to be a waste of time I'll be the first to let you know. 

Regards

Steve

Large spam

2015-07-15 Thread Zinski, Steve

We're starting to see a lot of spam in the 800KB to 1.2MB size range. I’m 
running MIMEdefang and it’s configured to skip messages larger than 100KB (and 
I hesitate to increase the limit due to performance issues). I read somewhere 
that there’s a way to have MIMEdefang (or spamassassin) strip out the non-text 
portions of the e-mail and scan. Can anyone help me set this up or point me in 
the right direction? Thanks!

Re: Detecting macros in word files

2015-07-01 Thread Steve Freegard


On 01/07/15 15:18, Marc Perkel wrote:

Is there any way to detect macros inside of word doc files as
attachments? Or linux command line utils to do so?



If you use ClamAV; you can enable the OLE2BlockMacros yes option and 
then catch the 'Heuristics.OLE2.ContainsMacros' reported by ClamAV 
(which is what we do here).


Kind regards,
Steve.

Re: spamassassin detailed logging

2015-06-19 Thread Steve Freegard

On 19/06/15 15:50, Kevin A. McGrail wrote:

On 6/19/2015 10:43 AM, Reindl Harald wrote:

if you only have one user=sa-milter then you're screwed

and how does a user=rcpt give you any useful information to grep for
the sender of the mail in the case above?

We need to agree to disagree because you don't need to convince people
that your interest in logging changes is right.

Instead, recommend you spend the energy working up a patch on the code
that does what you want. That's the brilliance of OSS and as a
committer on the project, I will give you my word that I will look
seriously at the patch and consider it for inclusion ESPECIALLY if you
have a CLA on file, it is documented and uses a command switch so that
it's yet another option that people can choose to use or not.

We need people contributing code much much more than ideas. If you
can't code, perhaps you know someone at your firm, in your dorm, down
the road, etc. who can! Get them helping the project!

See attached. Modify it as per your requirements and it will improve
the spamd log lines to include whatever you want it to.

I use it to add the calculated last-external IP address and to add the
UUID (Transaction ID) from Haraka which is the glue that I use here; the
data is passed to SpamAssassin via a header, which is then added to the
spamd log line via the API. That allows me to link the SMTP session to
the spamd log and vice-versa regardless as to whether a Message-ID
header is present or not.

spamd will already log the envfrom= line provided it has this
information passed through from whatever calls it. I send it over via a
X-Envelope-From: (see 'envelope_sender_header' in man
Mail::SpamAssassin::Conf).

Jun 14 12:30:20 mail1-ec2 spamd[6418]: spamd: result: . -5 -
DI_MX_IMPLICIT,HARAKA_DNSWL,HARAKA_DSPAM_00,HARAKA_FCRDNS,HARAKA_RELAYING,HARAKA_SENDER_AUTH,HTML_MESSAGE,RP_MATCHES_RCVD
scantime=0.8,size=8464,user=***@***.com,uid=500,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=37784,mid=WM!44f21c66bf985efa445526f7e5426264e223ecff488eaa8bff906b038bd86b39d466c13d9b768944b8a292c509c16656!@***.**.org,autolearn=disabled,lastexternal=198.246.200.77,envfrom=@***.**.org,haraka-uuid=13C5EA99-2D43-4D08-B813-7A6889C6D8D0.1

Kind regards,
Steve.

Haraka.pm
Description: Perl program
loadplugin Mail::SpamAssassin::Plugin::Haraka Haraka.pm

ifplugin Mail::SpamAssassin::Plugin::Haraka
header __HARAKA eval:get_haraka_uuid()
endif

Re: spamassassin detailed logging

2015-06-19 Thread Steve Freegard


On 19/06/15 16:57, Steve Freegard wrote:


spamd will already log the envfrom= line provided it has this
information passed through from whatever calls it.  I send it over via a
X-Envelope-From: (see 'envelope_sender_header' in man
Mail::SpamAssassin::Conf).



Actually - I'm talking rubbish; I just realised I added it elsewhere 
with this:


# Add envelope sender
my $envfrom = $pms-get('EnvelopeFrom');
if(defined($envfrom)  $envfrom) {
$pms-set_spamd_result_item( sub { return envfrom=$envfrom; } );
}

Kind regards,
Steve.

1 2 3 4 5 6 >

1 - 100 of 543 matches

Mail list logo