Re: Blocking Malformed "From" Headers

2024-07-17 Thread Dave Funk


The SMTP protocol RFCs are pretty clear, anything in angle-brackets '<' & '>' 
take priority in defining an address field. So technically that's a legit local 
address and sendmail is doing default MSA processing on it (IE treating it as a 
bare username that needs the local hostname added).


Is this sendmail instance just an incoming MTA or is it also used as an outgoing 
MSA for your users?


If it's just an incoming MTA (IE your users have another instance they're using 
for outgoing MSA service) then just turn off the MSA feature for that specific 
sendmail instance to stop that processing: "FEATURE(` no_default_msa')"



On Wed, 17 Jul 2024, Kirk Ismay wrote:


I have a spammer using a malformed From header, as follows:

From: sha...@marketcrank.com

The envelope from is: direcc...@delher.com.mx, and I've set up blocks for 
that address.


Sendmail is munging the From: header to change  to , 
so it ends up looking like a local address to my users.


How do I detect similar mangled From headers in Spamassassin?

Also does anyone know how to prevent Sendmail from rewriting the From header 
like this?  The documentation for confFROM_HEADER is a somewhat cryptic:


https://www.sendmail.org/~ca/email/doc8.12/cf/m4/tweaking_config.html#confFROM_HEADER

I'd rather it say  instead, or reject it entirely.

Thanks,
Kirk




--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: whitelist_auth return_path / from

2024-07-03 Thread Dave Funk

On Wed, 3 Jul 2024, Simon Wilson via users wrote:


Does whitelist_auth work on From header, or Return-Path? Reason I ask:



I have two emails from “support .at. wasabi.com”. Due to their emails usually 
triggering KAM rules I have (in
/etc/mail/spamassassin/local.cf):



## Whitelist Wasabi, subject to passing of auth
whitelist_auth supp...@wasabi.com

[snip..]


The other is not triggering whitelist_auth and is marked as spam due to the KAM 
rule fails. It has:

Return-Path: 
... 
From: Wasabi 
... 
Reply-To: supp...@wasabi.com

Despite passing SPF and DKIM, not whitelisted:

X-Spam-Score: 20.212
X-Spam-Level: 
X-Spam-Status: Yes, score=20.212 tagged_above=-999 required=6.2
 tests=[BAYES_00=-1.9, DCC_CHECK=1.1, DCC_REPUT_99_100=1.4, DKIM_INVALID=0.1,
 DKIM_SIGNED=0.1, HTML_MESSAGE=0.001, KAM_BODY_MARKETINGBL_PCCC=0.001,
 KAM_BODY_URIBL_PCCC=9, KAM_FROM_URIBL_PCCC=9, KAM_MARKETINGBL_PCCC=1,
 KAM_REALLYHUGEIMGSRC=0.5, LR_DMARC_PASS=-0.1, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01]
 autolearn=no autolearn_force=no

[snip]


Thanks.
Simon.


You say "passing SPF and DKIM" however in the SA rules report it clearly says:
 DKIM_SIGNED=0.1, DKIM_INVALID=0.1

So eventho you think 'passed DKIM' SA clearly does NOT think it does. That 
DKIM_INVALID will prevent the whitelist_auth from firing, thus you need to 
investigate what's going wrong there.



--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: Catch a rejected message ?

2023-12-01 Thread Dave Funk


That depends on the milter you're using to "glue" SA to postfix.
IE if you're using a milter (the thing that's triggering that "milter-reject" 
response) this means that Postifx is passing the messages to the milter, the 
milter is passing them to SA-spamd, getting the response and then feeding the 
results of interpreting SA's evaluation of the message.


That milter-reject status is the milter's responding to Postfix.

So you need to look at the capabilities of your milter to customize it's 
response for the particular message(s) in question.


Dave

On Fri, 1 Dec 2023, White, Daniel E. (GSFC-770.0)[AEGIS] via users wrote:



We are using SpamAssassin 3.4.6-1 with Postfix 3.5.8-4 on RHEL 8



We are seeing occasional blocked messages that say “milter-reject” with a spam 
score of 8



Is there a way to capture the offending messages to figure out the problem ?



Thanks





--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: Really hard-to-filter spam

2023-08-02 Thread Dave Funk

On Wed, 2 Aug 2023, Thomas Cameron via users wrote:


Wow! What a charming response! You must be a LOT of fun at parties, and have lots of 
friends! 


Please don't feed the troll. There's a reason that Reindl is blocked from this 
list.



No, I did not get that response. I don't have any of those specific spam to 
sample, as I have not gotten one today. But the last spam I got that
slipped through SA had this score:

X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,DEAR_SOMETHING,
DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,
HTML_MESSAGE,RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H2,RCVD_IN_PBL,
SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no
So nothing about any tests not working, or queries being rejected. Nothing that 
looks like misconfiguration on my end. I am not saying there are
no misconfigurations on my end, but if there are, it's not super obvious to me.


The fact that you're getting BAYES_00 on that message indicates that Bayes 
-really- thinks it's ham.
Given that you've trained multiple instances of this kind of message to Bayes as 
spam but it still gets BAYES_00 score means one of two things:
1) Either you've got thousands of instances of similar messages that were 
learned as 'ham'
2) or the database that Bayes in your running SA instance is using is not the 
same one that you were doing your training to.


This could be configuration issues or pilot error (using the wrong identity when 
doing the training, training on the wrong machine, etc).


On your SA machine what does the output of "sa-learn --dump magic" show you?
(IE how many nspam & nham tokens, what is the newest "atime", etc).

If careful config & log inspection doesn't give clues, try this brute-force 
test.
Shut down your SA, move the directory containing your Bayes database out of the 
way and create a new empty one.

("sa-learn --dump magic" should now show 0 tokens).

Then train a few ham & spam messages (only a dozen or so), recheck the --dump 
magic to see that there are now some tokens in the database but not too many.


Restart your SA and watch the log results. If there are fewer than 200 messages 
(both ham & spam) in your Bayes database then SA won't use it, so make sure 
that's the case, your new database should be too empty for SA to be willing to 
use it.
So if you -are- getting Bayes scores then that indicates that SA is using some 
database other than what you think it has.


Now start manually training more messages (spam & ham). When you hit the 200 
count threashold Bayes scores should start showing up in your logs.


Good luck.

--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: authres missing when ran from spamass-milter

2023-05-31 Thread Dave Funk

On Wed, 31 May 2023, Matus UHLAR - fantomas wrote:

[snip..]
milter adds own synthetised Received: header at the very beginning, which is 
mosts possibly the correct reason. 
spamass-milter should add this header behind locally added 
Authentication-Results: headers, but it needs change in spamass-milter.




tl;dr if those 'Authentication-Results: headers' are generated by the MTA itself 
the milter may not ever see them.


Which agent in the whole MTA system is adding those 'Authentication-Results: 
headers'?
Is it the master MTA itself (EG: postfix or sendmail) or is it some other milter 
component?


A milter can only work with what it's handed by the master MTA, if the 
Authentication-Results: headers aren't in its input stream then it cannot work 
with them.
In the original sendmail incarnation of the milter API it was designed so that a 
milter received the message input stream -before- local headers were added, thus 
the need for spamassassin 'glue' milters to do that Received: header synthesis.


If those Authentication-Results: headers are being generated by another milter 
then the solution is easy, just set the MTA configuration to run that milter 
before the spamassassin 'glue' milter. Milter results are chained so any headers 
explicitly added by one milter are passed on to succeeding milters.


If those headers are being generated by the MTA then it may not be possible for 
milters to see them with out hacking the MTA itself.



--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: comparing sender domain against recipient domain

2023-05-12 Thread Dave Funk

On Fri, 12 May 2023, Matija Nalis wrote:


On Thu, May 11, 2023 at 09:41:34PM +, Marc wrote:

I was wondering if spamassassin is applying some sort of algorithm to
comparing sender domain against recipient domain to detect a phishing
attempt?



[snip..]

That is because those domains are not EQUAL? Od did you wanted a
rule that checks only on SIMILAR domain names (e.g. with lowercase
letter "L" replaced with number "1" as in your example)?



Now I get it, the OP is looking for some kind of comparison function that does 
an "apparent linguistic distance" evaluation of two strings and returns a score 
that indicates a "visual similarity" value.

(EG replacing 'l' with '1' or 'O' with '0', etc).

several years ago there were a flood of phish messages that had a 'From' address 
that used 'PayPaI' to try to fool people.
I've also seen attempts using European character sets with letters that look 
like O or e to fake common domain names.


I've hand coded rules to check for this stuff when frequently abused but I don't 
know of a programmatic algorithm to do it automagically.


Dave

--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


metholdless URLs bypass DecodeShortURLs link shortner checking

2022-08-29 Thread Dave Funk
Today I found some spammy messages which contained tinyurl links that were not 
checked by my DecodeShortURLs checker.


Checking the tinyurl by hand using wget, I found that the destination was a URL 
that hit some of my URIBL lists.


The issue is that if the method is omitted from the url it is not considered for 
DecodeShortURLs checking.


EG: Click here does not get checked 
but http://tinyurl.com/REDACTED";>Click here does get 
checked.

This happens with SA 3.4.6

Note that this is specific to DecodeShortURLs, a methodless URL is still checked 
via direct URIBL rules.


Is this an issue with the DecodeShortURLs plugin or with SA?

Where would I find the most recent version of DecodeShortURLs plugin?

Thanks,
Dave

--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Matching on missing To field?

2022-07-20 Thread Dave Funk

On Wed, 20 Jul 2022, Alex wrote:


Hi,

I have a number of rules that match on the To field, but what to do if the To 
field is missing?

Received: from test.com (wsip-72-214-24-18.sd.sd.cox.net [72.214.24.18])
        by mail01.example.com (Postfix) with SMTP id 12425B9B
        for ; Fri, 15 Jul 2022 18:50:34 -0400 (EDT)

I realize I can match on the Received header here, but that would require 
creating an additional rule for each corresponding To rule. Perhaps
there's a way to combine them, or a tag that can be used for both?


Depending on your MTA and the message, that 'for ' element may 
be completely missing (for example if there's multiple recipients of a message).


Can you configure your "glue" to synthesize an addtional header from the 
envelope-to address of the message? Envelope recipient addrs must always exist, 
it's just a question of what you need to do to get it visable to SA.
Look at the "envelope_sender_header" entry in the SA docs, apply the same 
concept to the envelope recipient data.


In the milter I use, I create both envelope-From  & envelope-To headers.


I'm also aware of using ALL, but I think that may be too broad and may catch 
instances that shouldn't be. Can someone explain how this rule
works and if something similar would apply to my situation?

header         __HDRS_MISSP          ALL:raw =~ 
/^(?:Subject|From|To|Reply-To):\S/ism


That rule just says: look at all the raw header data and match if there's none 
of Subject, From, To, Reply-To entries.

IE a really malformed message.

Dave




























--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: Add header, not beginning with X?

2022-02-14 Thread Dave Funk

On Mon, 14 Feb 2022, joea- lists wrote:


The reason has to do with "reply" and "reply to all" with the email
client/system I am using and prefer to continue using for now.

Being subscribed to several lists, I find some variation between them
regarding the headers they provide and how my "reply" feature works.

Those that provide "Reply-to: somelist" act as expected and place the
list address in the To: field.  Those that do not
(users@spamassassin.apache.org included) find the address of the
poster rather than the list in the To: field.

While this is not a new issue, I do occasionally fail to correct the
address issue and an email goes astray. '

I'm aware that "modern" clients can deal with this and there are more
"practical" solutions, but I view this as an opportunity for "exercise"
and perverse amusement.

Does not appear to be something that can or should be done in SA, just
exploring possible avenues, or, abandoning the idea completely.


If you want this done for everybody on the system then modifying your MTA is the 
way to go (EG: at the postfix/sendmail level).
If you just want to do it just for your own messages then some kind of custom 
delivery filter (EG procmail) would be the way to go.



--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: page.link spam

2021-10-31 Thread Dave Funk

On Sun, 31 Oct 2021, Axb wrote:


On 10/31/21 5:26 PM, Matus UHLAR - fantomas wrote:

Hello,

it looks like google has registered page.link domain and users are already
using it for spamming:

https://secretadultnightclub.page.link/...

I have added it to my local domain-based blocklist.

any idea/tip what to do with it next?


blacklist_uri_host page.link


Been there, done that, got the FP wounds to show the risks of doing it.

My retirement account financial adviser sends me reports that include 
name.page.link URLs.


So selectivly blacklist full entries like secretadultnightclub.page.link but not 
just page.link

Think of it like you would link shortner URLs (EG bit.ly).

--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


SA 3.4.6 add From:addr host to URIHOSTS list?

2021-10-18 Thread Dave Funk
In SA 3.4.1 the host value of From:addr was automagically added to the URIHOSTS 
list and thus exposed to URIBL lookups.


SA 3.4.6 does not do that. Is there a configuration option to reactivate that 
feature?


Thanks,
Dave

--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: handle_user and connect to spamd failed

2021-10-18 Thread Dave Funk

On Mon, 18 Oct 2021, Linkcheck wrote:


On 18/10/2021 11:20 am, Matus UHLAR - fantomas wrote:

spamd by default tries to find recipients' home directories and user
preferences in them. try passing following option to spamd:

 instruct spamd to connect to 127.0.0.1


Sorry, I'm not sure where to do that. I've tried as noted in the OP; I can't 
find anywhere else (remembering I've dropped spamfilter.sh).


Actually that timeout error is coming from "spamc". spamass-milter uses spamc 
under the hood to connect to spamd. It's spamc that is trying to connect to 
"localhost" which contains that IPv6 reference.
Add an option to spamass-milter telling it to pass on to spamc the connect-to 
host is 127.0.0.1 not localhost.


IE:

This made no difference. I also have /etc/default/spamass-milter with the 
options:
OPTIONS="-u spamass-milter -i 127.0.0.1 -4"


Add the option "-D 127.0.0.1" in that spamass-milter OPTIONS.



--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: handle_user and connect to spamd failed

2021-10-18 Thread Dave Funk

On Mon, 18 Oct 2021, Linkcheck wrote:


On 18/10/2021 11:20 am, Matus UHLAR - fantomas wrote:

spamd by default tries to find recipients' home directories and user
preferences in them. try passing following option to spamd:

   -x, --nouser-config, --user-config


Thanks. Where would I actually add that? Which file / command?


Those options need to get used in your spamd startup arguements.
They go in the same place you've got things like --max-children.
But if you're going that nouserconfig route, omit the --create-prefs option.




 -H directory, --helper-home-dir=directory


Is that the literal 'directory'? I took that to mean an actual directory.


Matus is saying that your '--helper-home-dir' option syntax in your spamd 
settings is wrong. You say that you have those set to:



OPTIONS="--create-prefs -4 --max-children 5 --helper-home-dir /var/lib/spamassassin 
-u debian-spamd"


Mattus is saying that it should be:


OPTIONS="--create-prefs -4 --max-children 5 --helper-home-dir=/var/lib/spamassassin 
-u debian-spamd"


Or:


OPTIONS="--create-prefs -4 --max-children 5 -H /var/lib/spamassassin -u 
debian-spamd"


IE the '--helper-home-dir' option needs an '=' with no spaces, or use the -H

--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: elf signature for clamav

2021-09-26 Thread Dave Funk

On Sun, 26 Sep 2021, Benny Pedersen wrote:



# cat local_elf.ndb from /var/lib/clamav (databasedir in clamd)
Sanesecurity.ELF.1:6:0:7F454C46

took me 5 mins to make :)

thanks to KAM on this its very simple, i like feed back from mimedefang and 
amavisd users


If you use the "ClamAV" SA plugin ( 
http://wiki.apache.org/spamassassin/ClamAVPlugin ) then you can use the full 
power of ClamAV scanning/detection in SA with out the need for external 
connectors like mimedefang or amavisd.


This has the advantage of being open to a SA users and makes it possible to make 
special meta rules combining the results of ClamAV scans with other SA filtering 
such as welcome_auth validated trusted sources.


I run two copies of the ClamAV engine:
1) standard ClamAV with standard rules called from milters in my front line MX 
servers to outright block known malware.
2) a customized ClamAV with full bells-&-whistles such as Heuristics and lots of 
custom add-in signatures (EG 
https://github.com/extremeshok/clamav-unofficial-sigs).
These can have a moderate FP risk but run from within SA I can use other rules 
such as welcome_auth to control their risk or use them at low score but meta 
with other things such as Bayes to jack up the score.




--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Message-ID with IPv6 domain-literal

2021-09-21 Thread Dave Funk

On Tue, 21 Sep 2021, Bill Cole wrote:


On 2021-09-21 at 12:25:30 UTC-0400 (Tue, 21 Sep 2021 10:25:30 -0600)
Grant Taylor 
is rumored to have said:


But why the penalty for using non-public addresses* in a Message-ID: string?


Empirical evidence. The use of a non-public address in a Message-ID correlates 
to a message being spam. In my experience, so does using an IP literal of any 
sort in a Message-ID, but that may be an idiosyncrasy in my mail.


 I was not aware that Message-ID had any requirements that the content had to 
mean anything beyond being syntactically correct.  As such I would expect 
private / non-globally routed content to be allowed.  After all, isn't the 
purpose of the Message-ID to be a universally unique identifier?  If so, why 
does it matter what the contents is as long as it's syntactically correct?  
What am I missing?


Private IP addresses in general cannot specify globally unique devices 
(consider 127.0.0.1 or the very-popular 192.168.1.1) and therefore a Message-ID 
using an IP literal as the RHS part with a non-public IP cannot assure 
uniqueness.


That is valid for Private IP addresses.

However "[IPv6:::193.168.1.30]" is the representation of IPv4: 193.168.1.30 
which is a Public IP address, thus that 'hit' is in error.

This should be considered a parsing bug.


--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: An interesting bit of HTML from a spam

2021-09-12 Thread Dave Funk

On Sun, 12 Sep 2021, Loren Wilton wrote:

I found this little wonder in a bunch of spams I've been getting for the last 
few days:


http://"; http://"; http://"; http://"; http://"; http://"; 
href="http:/mi.wey.vandalized655bccemetries -dot- cleaning/id>">unsubscribe here


I have no idea if that actually works, since I'm not about to try it.


The base hostname in that URL (I bowdlerized it in this message) is listed in a 
couple different URIBLs.


SA 3.4.1 is able to spot/extract that name from the garbage and trigger URIBL 
rules.
In debug mode for this message its 'URIDOMAINS' contains: 
ARY:[oxsus-vadesecure.net,uiowa.edu,uiowa.edu,avg.com,vandalized655bccemetries.cleaning,oxsus-vadesecure.net]


SA 3.4.6 not so much. it doesn't seem to "see" that href/URL at all.
Its 'URIDOMAINS' contains: value: avg.com

So why is SA 3.4.6 much less sensitive about picking up hosts in URLs?



--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: spamass-milter (sa daemon loads config different to shell ?)

2021-07-27 Thread Dave Funk

On Tue, 27 Jul 2021, David Bürgin wrote:


Dipl-Inform. Frank Gadegast:

On 27.07.21 14:18, David Bürgin wrote:

Dipl-Inform. Frank Gadegast:
Seems to be, that spamass-milter simply strippes out any X-Spam* header 
lines, not caring, if the own call to spamd sets them, hm.


Im really not getting, why spamass-milter should strip X-Spam-lines of 
the header AFTER SA was running. If Im right, SA is stripping them of 
anyway, before running or modifying anything ...



Anybody an idea how to get arround this ?


There is an alternative milter (which I maintain) that adds
all X-Spam-* headers received from spamd.

https://crates.io/crates/spamassassin-milter


Looks like your milter needs to fork a spamc, wich then talks to the spamd. 
This will start lots of spamc processes and is not recommened.


Would then not be any different to call spamc dirctly f.e. via procmail.

You should rewrite your milter to talk directly to the spamd via socket or 
port.


Yes, it communicates using spamc, just like spamass-milter.

I have been told that it has been working fine in a somewhat larger
deployment. I didn’t mean to derail the thread so will leave it at that.


having a spam filtering milter fork off a shell and then run "spamc" to 
communicate with spamd does simplify the milter code (and insulates it from 
changes in the spamd protocol) but adds risk of shell escape attacks (as well as 
additional overhead).
There's already been security related patches needed by spamass-milter 
specifically because of this issue.


Writing a milter that directly talks the spamd protocol via a socket (local or 
network) is more work but safer and more efficient.

(been there, done that, got the code to prove it).


--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: SA 3.4.5 meta with RBL rules not working.

2021-07-19 Thread Dave Funk

Ugg, I was afraid of that.

For decades I've rolled my own install of things like sendmail, SA & ClamAV but 
this time I wanted to try the release supplied by our server OS vender (SuSE).

Unfortunately that's SA 3.4.5.

OK, back to the salt-mines.

Thanks

On Mon, 19 Jul 2021, Henrik K wrote:



How about upgrading to latest 3.4.6?

This release includes fixes for the following:
 - Fixed URIDNSBL not triggering meta rules

On Mon, Jul 19, 2021 at 01:42:51AM -0500, Dave Funk wrote:

I recently updated from SA 3.4.1 to 3.4.5 and noticed that a number of my
"meta" rules quit working.

I have a number of meta rules that combine RBL/URIBL rules with other rules
and they no longer fire, eventho the various components are fireing.

EG, a rule like:

meta L_TEST_NS2c   ( URIBL_ABUSE_SURBL && HTML_MESSAGE )
describe L_TEST_NS2c   abusive HTML message
score L_TEST_NS2c  1.1

does not fire even tho the message under test triggers both
URIBL_ABUSE_SURBL & HTML_MESSAGE.
This used to work as expected under 3.4.1.

Running a message thru "spamassassin -D" does not give any clues what's
going wrong.

Any suggestions about how to debug this?

Thanks,
Dave

--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{





--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


SA 3.4.5 meta with RBL rules not working.

2021-07-18 Thread Dave Funk
I recently updated from SA 3.4.1 to 3.4.5 and noticed that a number of my "meta" 
rules quit working.


I have a number of meta rules that combine RBL/URIBL rules with other rules and 
they no longer fire, eventho the various components are fireing.


EG, a rule like:

meta L_TEST_NS2c   ( URIBL_ABUSE_SURBL && HTML_MESSAGE )
describe L_TEST_NS2c   abusive HTML message
score L_TEST_NS2c  1.1

does not fire even tho the message under test triggers both URIBL_ABUSE_SURBL & 
HTML_MESSAGE.

This used to work as expected under 3.4.1.

Running a message thru "spamassassin -D" does not give any clues what's going 
wrong.


Any suggestions about how to debug this?

Thanks,
Dave

--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Email Phishing and Zloader: Such a Disappointment

2021-07-11 Thread Dave Funk

On Sun, 11 Jul 2021, Kevin A. McGrail wrote:


On 7/11/2021 5:11 PM, John Hardin wrote:
"The other parts contain an application/vnd.ms-officetheme and an 
application/x-mso file. Which (in addition to the text/xml files) are used 
by Microsoft Word to load the embedded Word document."


Would the presence of all three of those MIME types be a scorable 
indicator?


If you can get me a spample, I'm sure I can tell you but in general we block 
macros so that's all that's needed.  Likely the OLEVBMacro plugin and KAM 
ruleset is blocking all of these already if you have the plugin enabled.


Regards,

KAM


Aren't there already rules and heuristics in ClamAV for detecting VBmacros in 
office docs?


I've got two copies of ClamAV running, one used as a blocking direct milter with 
default rules and another one feeding into the SA "clamav.pm" plugin with extra 
rules and heuristics/algorithms enabled.




--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: spamass.sock - No such file or directory

2021-06-27 Thread Dave Funk

Make sure to start spamasmilter before postfix.

The milter creates the socket which must exist for postfix to be able to open 
it.


Start the milter then use "lsof" to make sure that it has created the socket and 
that it's in the place which postfix expects to find it.


Also make sure that the permissions on the path thru the directories containing 
the socket are traversable by postfix and that the permissions on the socket 
itself provide postfix 'rw' rights.



On Sun, 27 Jun 2021, Dominic Raferd wrote:


Try unix:/run/spamass/spamass.sock

On Sun, 27 Jun 2021, 18:28 ,  wrote:
  Still the same

  Jun 27 19:21:03 nmail postfix/smtps/smtpd[4946]: warning: connect to 
Milter
  service unix:spamass/spamass.sock: No such file or directory
  Jun 27 19:25:37 nmail postfix/smtps/smtpd[5552]: warning: connect to 
Milter
  service unix:run/spamass/spamass.sock: No such file or directory

  Thanks for any update


  -Ursprüngliche Nachricht-
  Von: Reindl Harald 
  Gesendet: Samstag, 26. Juni 2021 12:15
  An: mau...@gmx.ch; users@spamassassin.apache.org
  Betreff: Re: spamass.sock - No such file or directory

  why do you think "/run/spamass" and "unix:/spamass/" are the same path?

  Am 26.06.21 um 09:37 schrieb mau...@gmx.ch:
  > Run with Debian 10
  >
  > I dont see why “spamass.sock: No such file or directory” this message
  > appair
  >
  >>mail.log
  >
  > Jun 26 09:27:12 nmail postfix/smtps/smtpd[9509]: warning: connect to
  > Milter service unix:/spamass/spamass.sock: No such file or directory
  >
  >>main.cf
  >
  > smtpd_milters = unix:/spamass/spamass.sock,
  > unix:opendkim/opendkim.sock, unix:opendmarc/opendmarc.sock
  >
  >>/run/spamass# ls -la
  >
  > -rw-r--r--  1 spamass-milter spamass-milter 5 Jun 26 09:26
  spamass.pid
  > srw-rw  1   postfix  postfix 0
  Jun 26 09:26 spamass.sock
  >
  > or
  >
  > srw-rw  1   spamass-milter spamass-milter 0 Jun 26 09:26
  spamass.sock
  >
  >/etc/group
  > spamass-milter:x:128:postfix
  >
  > thanks for any help


--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: Scan Attachment Content Using Spamassassin

2021-06-03 Thread Dave Funk

On Thu, 3 Jun 2021, Henrik K wrote:


On Thu, Jun 03, 2021 at 09:32:28AM +0200, Matus UHLAR - fantomas wrote:

On 03.06.21 09:23, Henrik K wrote:

That's just outdated information.  It's fine to scan even 20MB+ messages, it
just requires some memory.


and CPU and time...


Those are affected very little by message size.  And all that is pretty much
negated by large messages being uncommon.


Be that as it may, the OP wanted to do DLP scanning of messages containing 
PPTx,XLSx, etc, and it's uncommon to see a small PPTx file, large is more common 
w/ such media.


Also, spamassassin does not have a native built-in component for parsing such 
media attachments, it would need to be some kind of add-in (EG the "fuzzy ocr" 
plugin that was the rage a while ago).
As such it adds an additional complication that needs to be integrated/ 
managed/updated etc.


Probably better to use a whole different tool that comes with that kind of 
capability built-in (EG ClamAV).



--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Scan Attachment Content Using Spamassassin

2021-06-02 Thread Dave Funk

On Thu, 3 Jun 2021, KADAM, SIDDHESH wrote:


Hello Folks,

Is there any possible way using we can scan for the content of an attachment ie 
.doc/pdf/.xls/ppt etc...

Planning is to have a DLP kind of protection with the help of Spamassassin. 


Regards,
Siddhesh


spamassassin really isn't the best tool for this job. It's really designed for 
looking at text stuff, and how do you squeeze the text out of a ppt or xls in a 
meaningful way?
Even more limiting, spamassassin is designed for small to medium size messages, 
scanning anything over 500KB or so is going to be a resource hog.


What would be better is a tool that is already designed for scanning .doc / pdf/ 
.xls/ ppt etc.; an anti-virus program with custom rules for the kinds of info 
you want to detect.


ClamAV has builtin DLP rules for standard kinds of PII (EG CC#s, SSNs, etc) and 
comes with tools to help you craft custom rules if you have particular kinds of 
info you need DLP for.


Start with a mail scanning framework (EG amavis or mimedefang) and plug in 
spamassassin for spam and two instances of ClamAV, one with standard anti-virus 
rulesets and another with your DLP rules. Then you can use the framework 
to take what ever kinds of actions you want based on what components 'fired'.





--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Counting number of instances of a particular header

2021-05-03 Thread Dave Funk
I'm trying to create a rule to count the number of instances of a particular 
header.
IE in email messages there could be zero or more instances of a particular 
header and I want to know how many there are so I can use that info in a meta to 
detect a spam sign.


I first crafted a rule:
header L_MY_HEADER   X-My-Header !~ /^UNSET$/ [if-unset: UNSET]
describe L_MY_HEADER has X-My_header
score L_MY_HEADER0.1

Which did correctly detect the existence of 'X-My-Header'. Then to count the 
number of them I added a 'tflags':

tflags L_MY_HEADER  multiple maxhits=10

But that would always fire 10 times if there were any instances of 'X-My-Header' 
(even if there was only one).


So I modified the pattern match part of the rule:
header L_MY_HEADER  X-My-Header =~ /./

Which had the same effect as the first form (IE either zero or 10 firings).

As the header would have at least 6 characters but less than 150 I then tried:
header L_MY_HEADER  X-My-Header =~ /^.{5,200}/

Which would fire only once, even if there were 5 or more instances of the 
header.


What am I doing wrong? How should I craft a rule to count the number of 
instances of that header?


Thanks,
Dave

--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Error "cannot open bayes databases" lock failed: File exists

2021-01-20 Thread Dave Funk

On Wed, 20 Jan 2021, Matus UHLAR - fantomas wrote:


On 20.01.21 11:07, Emanuel Gonzalez wrote:

Date: Wed, 20 Jan 2021 11:07:59 +
From: Emanuel Gonzalez 
To: SA Mailing list 
Subject: Re: Error "cannot open bayes databases" lock failed: File exists

Hello everyone, i'm back from my vacations, i try solved this problem but i 
could not.


I still see in the spamsassin error logs the mentioned error:

bayes_learn_to_journal 1
use_bayes yes
bayes_path /var/spamassassin/bayesdb/bayes
bayes_auto_learn 0
bayes_auto_expire 0



try:

ls -la /var/spamassassin/bayesdb/bayes
lsof /var/spamassassin/bayesdb/bayes_journal 
/var/spamassassin/bayesdb/bayes_seen /var/spamassassin/bayesdb/bayes_toks


Umm, the command:
  ls -la /var/spamassassin/bayesdb/bayes

should get you the error:

ls: cannot access /var/spamassassin/bayesdb/bayes : No such file or directory

On the otherhand:

 ls -la /var/spamassassin/bayesdb/bayes*
(taken from the bayes_path parameter) should get you what you want.

even better:

 ls -la /var/spamassassin/bayesdb/
(to see if there's any leftover lock files in that directory)


--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: BCC Rule and Subject change for specific rule

2021-01-05 Thread Dave Funk

On Tue, 5 Jan 2021, John Hardin wrote:


On Tue, 5 Jan 2021, Giovanni Bechis wrote:


On Mon, Jan 04, 2021 at 05:23:30PM -0800, John Hardin wrote:


I'm pretty sure SA only allows setting the subject tag by language, not
based on rule hits.


Starting from 3.4.3 you can add a prefix to the email subject like that:
header  FROM_ME From:name =~ /Me/
subjprefix  FROM_ME [From Me]


Cool, I missed that at the time. Thanks!

The documentation does mention it exists but does not give an example of 
using it...


Does this work if you're using a milter for your glue?

Is there some special status/command that spamd returns to the milter for this 
kind of modification? If so the milters may need to be recoded to implement it.



--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Bypass RBL checks for specific address

2020-12-23 Thread Dave Funk

On Wed, 23 Dec 2020, Grant Taylor wrote:


Context is Sendmail, spamass-milter, and SpamAssassin (spamd).

I didn't see any way to have spamass-milter bypass, much less conditionally 
bypass.  Nor did I see a way to have Sendmail conditionally bypass a milter.


If all you want is for a particular class of recipients (at the envelope RCPT 
level) not be passed to spamass-milter inside sendmail that can be done with a 
bit of hacking of your sendmail config and the milter.


I run my own customized miltrassassin milter which has support for custom macros 
handed to it from sendmail and it takes special action based on what it gets 
handed.
For example if the 'skip_check' is defined, the miter just returns a 'OK' and 
doesn't call SA at all.
If the 'no_reject' macro is set then the milter will not generate a "550" SMTP 
status regardless of how high the SA score is. (needed for "postmaster" 
messages).


What version of sendmail are you using?


--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Bypass RBL checks for specific address

2020-12-23 Thread Dave Funk

That may not work for what the OP wanted.

Because it's assumed that DNS related stuff may take some time those rules (if 
configured to run) are launched early in the processing of a message.


So if the OP wants to completely avoid running RBL checks (as opposed to just 
ignoring their scores/results) he may need to do some special tricks.


One thing would be to have a separate SA instance with its own configuration 
which has the RBL stuff removed and then configure his MTA to select that 
particular SA filter when the special user address is detected.


This begs the question, what is the need to completely avoid running RBL checks 
for that special recipient?
What is supposed to happen when a message comes in that is addressed to multiple 
recipients, including the special recipient?


This could get messy.

On Wed, 23 Dec 2020, Iulian Stan wrote:


Hello all,

You can create a meta rule with very high prio(actually check to be higher than 
your RBL), match what you need
from email headers and than use shortcircuit to skip additional tests.


Best regards,
Iulian Stan



Sent from my Galaxy


 Original message 
From: Grant Taylor 
Date: 12/23/20 20:59 (GMT+02:00)
To: users@spamassassin.apache.org
Subject: Re: Bypass RBL checks for specific address

On 12/22/20 11:56 PM, Axb wrote:
> whitelist_to ?

My understanding is that whitelist_to, more_spam_to, and all_spam_to
behave the same way and effectively just alter the scoring offset.

It seems as if the tests are still run, and it's just the score is
artificially offset based on which setting is used.

I'm wanting to not run RBL tests for the specific recipient email address.



--
Grant. . . .
unix || die





--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: adding AV scanning to working Postfix/SA system

2020-12-02 Thread Dave Funk

On Wed, 2 Dec 2020, Joe Acquisto-j4 wrote:


Hacking away, seem to have it working?,   Using CLAMAVPlugin. At least mail
does not appear "broken".

But EICAR is not detected.  I "think" it is being scanned as I see this:

*
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on auxilary
X-Spam-Level: *
X-Spam-Status: No, score=1.0 required=5.0 tests=BAYES_00,FREEMAIL_FROM,
HTML_MESSAGE,SPOOFED_FREEMAIL_NO_RDNS,TVD_SPACE_RATIO autolearn=no
autolearn_force=no version=3.4.2
X-Spam-Virus: _CLAMAVRESULT
X-Spam-Report:
* -1.5 BAYES_00 BODY: Bayes spam probability is 0 to 1%
*  [score: 0.]
*  1.0 FREEMAIL_FROM Sender email is commonly abused enduser mail
*  provider (joe.acquisto[at]gmail.com)
*  0.0 HTML_MESSAGE BODY: HTML included in message
*  0.0 TVD_SPACE_RATIO No description available.
*  1.5 SPOOFED_FREEMAIL_NO_RDNS From SPOOFED_FREEMAIL and no rDNS
*

Is that proof it is being scanned and the non detection issue lies elsewhere?

joe a.


What, specifically, is the config you're using to invoke CLAMAVPlugin?

You need to have at least two things set up in your spamassassin config files:
1) load the plugin in a "v*.pre"
2) invoke the check_clamav() procedure

EG:
in v320.pre

# AntiVirus - some simple anti-virus checks, this is not a replacement
# for an anti-virus filter like Clam AntiVirus
#
#loadplugin Mail::SpamAssassin::Plugin::AntiVirus
#
loadplugin ClamAV /usr/local/etc/mail/spamassassin/plugins/clamav.pm

Note that line depends on the path to where you've installed the plugin

In a ".cf" rules file (I call mine clamav.cf ):

#
# config file for using the ClamAV plugin "clamav.pm"
#
full L_CLAMAV   eval:check_clamav()
describe L_CLAMAV   Clam AntiVirus detected a virus
score L_CLAMAV  5
#
header T__MY_CLAMAV X-Spam-Virus =~ /Yes/i
header T__MY_CLAMAV_SANE X-Spam-Virus =~ /Yes.{1,50}Sanesecurity/i
#



--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: adding AV scanning to working Postfix/SA system

2020-12-02 Thread Dave Funk

On Wed, 2 Dec 2020, Tom Hendrikx wrote:




On 02-12-2020 16:18, Joe Acquisto-j4 wrote:

X-Spam-Virus: _CLAMAVRESULT


I never integrated Clam using this plugin, but this seems a config typo to 
be: there should be a Yes/No in there, and optionally a virus name.




Yes, it looks like he's got a type-o in there. The config line should be:
"add_header spam Clamav _CLAMAVRESULT_"
in a .cf someplace.
Then the plugin will add that 'X-Spam-Virus:' header with the text "Yes" 
followed by the name of the virus detected.


You can then use the value of that header in other rules to add points for 
various kinds of things detected or "meta"ed with other rules.





--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: amazonses.com doubble dkim sign

2020-11-09 Thread Dave Funk

On Tue, 10 Nov 2020, Benny Pedersen wrote:


DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple;
s=n4atlko3yvgxyqpwp7palysab6occe3l; d=fing.com; t=1604971038;
h=From:To:Message-ID:Subject:MIME-Version:Content-Type:Date;
bh=0LT5Ztzk2B+Ecm2NPRzroGl6fTFNX9TpP6X0036qmf4=;
b=Rtc9ieWPMuaNZ9iRZPZMEfuGj7pnaXu6TPjT9px08NGKZt0+rbCLyz083FG3djhk
UTdHNgkEc6xGCCRN0JzbrdYaHWptG2U42qOYEajdE59uuR/Ucy+rGJA8Vr2roe/Ssvm
jYWosu47Ndl6M56u9m3aNpAuBOgNmQHWoMVyWXZU=
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple;
s=shh3fegwg5fppqsuzphvschd53n6ihuv; d=amazonses.com; t=1604971038;
h=From:To:Message-ID:Subject:MIME-Version:Content-Type:Date:Feedback-ID;
bh=0LT5Ztzk2B+Ecm2NPRzroGl6fTFNX9TpP6X0036qmf4=;
b=lihzmRF2B+mUjB1E89LLJ8JkbpbQQIpnPd5JtQjAGB5uSurBWfv6VrGHgbCy2O1e
q7AWlXPTcwdca5K4iB0pormV/lgvfZV+kgwfSrLPlgWBwlB9hRi2TCsFhT9v9tbEm1b
dZBXrPRFO9r+uDtLfR6OgaOtXq7RjMiAUqcDBm0k=
From: Fing Alert 

why ?


Two signatures, one for the 'From:' address (message creator) and one for the 
issuing SMTP system.
Look at the signing domain (the 'd=D.N' part) to see who the creator of a given 
signature is.


There's nothing to prevent each system in the SMTP hand-off chain from adding 
their own signature, provided they do nothing to invalidate earlier signatures.

More than two is unusual/overkill, but it's not uncommon to see two.


--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: questions on spamassassin

2020-09-05 Thread Dave Funk

On Sat, 5 Sep 2020, Rajesh M wrote:


dear friends,

had a few questions

1) what is the sequence based on which the rules are processed ?
is there any documentation on this ?
how is the rule number example 20_dnsbl_tests.cf  or 25_uribl.cf related to the 
sequence of rule processing ?


Are you asking about rule sequencing or configuration file sequencing?
"20_dnsbl_tests.cf" is a configuration file which contains zero or more rules.

During startup spamassassin reads all configuration files that are found in a 
list of specific directories (which are distro dependent). The directories are 
searched in list order for configuration files (name.cf), the files are read in 
lexical order.


So if you have a rule (EG: "MY_RULE_2") in file 20_my_rules.cf and another 
instance of "MY_RULE_2" in 99_my_rules.cf (in the same directory) the "99" file 
will be read after the "20" file and the latter definition of "MY_RULE_2" will 
over-ride (replace) the one from "20".


Also the system provided rules directories are processed before the user 
supplied directories (intentionally) so a user can over-ride a system rule if 
they don't like how that particular rule works.
See: 
https://cwiki.apache.org/confluence/display/SPAMASSASSIN/WhereDoLocalSettingsGo


Once all the rules are read and parsed spamassassin has an internal order to how 
specific rules get run.


--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Thanks to Guardian Digital & LinuxSecurity for the nice post about SpamAssassin's upcoming change

2020-07-23 Thread Dave Funk

On Thu, 23 Jul 2020, Antony Stone wrote:


On Thursday 23 July 2020 at 04:36:41, Olivier wrote:


I am wondering what grey list should be renamed...


Why - has the zombie population started complaining about racial slurs?


You have just pissed off Oscar the gray geriatric grouch. ;)

This is the letter G brought to you by Oscar the grouch.

--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: score sender domains with 4+ chars in TLD?

2020-06-12 Thread Dave Funk

On Sat, 13 Jun 2020, RW wrote:


On Fri, 12 Jun 2020 09:22:40 -0400
AJ Weber wrote:


I want to try adding a score for a sender whose address uses a TLD
with  > 3 chars.

I realize there are some legit ones, but I'm going to test it with a
low score and see what it catches.



What I did was grep my mail for TLDs seeen in ham and then create a
rule __NORMAL_TLD

I then score a point for:

__HAS_FROM  && ! __NORMAL_TLD


This probably wont scale well beyond a few users though.


If I were a bit more energetic I'd autogenerate the rule from cron.


This sounds like a perfect application for a custom DNS-bl lookup/list.

Create a local custom rbldnsd server "dnset" zone from a data file with your 
blessed TLDs, then a rule doing a rbl check using the hostname from the From 
address with custom scoring.


You can easily update the rbldnsd zone data (just write/update the data file, no 
need to restart spamd) and could create a custom scoring value based on the DNS 
data (EG 127.0.0.2 for really 'good' TLDs, 127.0.0.4 for 'so-so' and 127.0.0.8 
for truely spammy names).





--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Somewhat OT: DMARC and this list

2017-05-20 Thread Dave Funk

On Sat, 20 May 2017, David Jones wrote:


From: David B Funk 

[snip..]

The message from you that I'm replying to here (both the one that came directly
to me and the copy I got thru the  Apache list server) are -totally- devoid of
DKIM headers. (If you'd like to see it I can put it up in paste-bin.)


I figured out what was going on.  Microsoft must have recently (past few
months or so) started sending our outbound mail through another IP range.
I have updated my opendkim.conf to cover all Office 365 outbound servers.


This is one of the things that I dislike/fear about being dependent on 
cloud based services.
Many traditional system paradigms use the concept of trusted IP 
addresses (EG: internal_networks, trusted_networks, etc) for making 
operational decisions.


When using cloud based services you have no control over their IP 
addresses and have to worry about when they might change with out notice, 
whom else they might be servicing using those same addrs, AND when they 
might abandon them only for somebody else to start using them.


It also reduces the usefulness of RBLS and can even adversely affect the 
performance of things such as Bayes.


When you get major amounts of Ham from O-365 most of the tokens derived 
from O-365 messages get 0.000 score. So when spammers use O-365 even 
blatant spam gets a Bayes score of 00%. (and this is after putting all the 
O-365 headers in bayes_ignore_header statements).
(Our institution recently moved the majority of users' mail to O-365 so 
this is a battle I'm fighting now).


Bottom line, in this brave new world address based auth(n/z) decisions are 
going to be increasingly problematic and an increasing reliance on things 
such as digital signatures.


Dave

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: R: learn ham

2017-01-05 Thread Dave Funk

On Thu, 5 Jan 2017, Nicola Piazzi wrote:


Each minute it learn messages of the last minute so it read and learn one time 
only for each message
Messages are that it sends from internal, so il learn that words are not spam

Internal messages are not spam


Until one of your users gets their account hacked/phished and spammers 
then use it to abuse your server to send out megabytes of spam.

(or they may have had an account on Yahoo that used the same password).

Careless users happen to the best of us. ;(

John's point is still valid; blind un-vetted automated Bayes learning is 
asking for trouble.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Detecting Valid Message Replies

2017-01-03 Thread Dave Funk

On Tue, 3 Jan 2017, ma...@assembly.state.ny.us wrote:


On 1/3/2017 8:12 AM, Christoffer G. Thomsen wrote:

blacklist or increase score for mails that reply to unknown
message IDs.


Remember that someone out in the world might do a "Reply all" to a
message which was also Cc'd to one of your users.  This would show up as
an unknown message ID.  Of course,to remedy this, you could also keep
track of incoming message IDs.


That would make the wrong decision in the following scenario:

  A sends message to B
  B replies to A and also adds C to the "CC" list
  (as B thinks that C should be involved in the conversation)

In this case C would receive a "reply" to a message that she's never seen 
before, but is a legitimate communication.


This scenario may seem contrived but I've seen it happen around me with 
some regularity (both as a recipient & creator).


And then there's the case where somebody forwards to you a reply that they 
got so you get a message "Re: blah de blah (fwd)"


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: DNS Terminology

2016-09-23 Thread Dave Funk

On Fri, 23 Sep 2016, Lindsay Haisley wrote:


On Fri, 2016-09-23 at 19:03 -0400, listsb-spamassas...@bitrate.net
wrote:

consider that, to do the work described as "forwarding" in many of
these references, the nameserver must perform a recursive query [e.g.
it must perform a query with the rd bit set].


"A forwarding DNS server offers the same advantage of maintaining a
cache to improve DNS resolution times for clients. However, it actually
does none of the recursive querying itself. Instead, it forwards all
requests to an outside resolving server and then caches the results to
use for later queries."

What am I missing?

Justin Ellingwood, who wrote the DigitalOcean piece, is a very
experienced documenter. From his rather impressive resume, I'd be
inclined to trust what he posts.


This is the difference between asking a question (formulating a query 
potentially with the "want recursion" bit set) and then doing the work of 
chasing down all the different stake-holders necessary to answer the 
question (performing the recursive query)
VS handing the query off to a 3'rd party and letting them do the dirty 
work (forwarding)


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Spam by IP-address? Spamassassin with geoiplookup?

2016-09-22 Thread Dave Funk

On Thu, 22 Sep 2016, Thomas Barth wrote:

And what about filter poisening? In the last 10 hours my company address got 
43 mails classified as spam (even a virus mail detected today). And there was 
one mail classified as spam due to my rule (bad country, message-id.


X-Spam-Status: Yes, score=7.474 tag=2 tag2=6.31 kill=6.31
   tests=[MESSAGEID_LOCAL=3, RDNS_NONE=1.274, RELAYCOUNTRY_BAD=3.2]
   autolearn=no autolearn_force=no

The content of the mail is:


From: "Lupe Monroe" 
To: "my boss address"
Subject: Payment approved
MIME-Version: 1.0
Content-Type: multipart/related;
   boundary="boundary_af9c8db46eb73fca8b315aafef01"
Message-Id: <20160922063255.e11d3e5...@static.vnpt.vn.local>
Date: Thu, 22 Sep 2016 06:32:55 +0700

--boundary_af9c8db46eb73fca8b315aafef01
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit

Dear so,

Your payment has been approved. Your account will be debited within two days.

You can email us for any query regarding your account.

Thank you.

Lupe Monroe
Support

--boundary_af9c8db46eb73fca8b315aafef01
Content-Type: application/x-zip-compressed; 
name="e6dfa16bdb.zip.virus-scan-me.virus-scan-me"

Content-Transfer-Encoding: base64
Content-Disposition: attachment; 
filename="e6dfa16bdb.zip.virus-scan-me.virus-scan-me"



There is no spam content, am I right? Normal words and content that a normal 
person can use. I dont need spam learning for all the mails already 
classified as spam with high score. Spam with low score are interesting for 
spam learning like this one. But when I use these mails for spam learning 
there is a risk of false positive some day, because it has learned that 
normal mails are also spam?


You are missing the point that Bayes uses more than just body words from a 
message. It also looks at headers and meta-data. So those particular body 
words could become "neutral" (neither spam nor ham indicators) but the 
other components of that message (such as that '.vn.local' message ID) 
would be learned as spam signs.


This is why you MUST also train your Bayes with HAM messages (and train 
them with the --ham flag) so Bayes knows how to recognise 'hammy' or 
'neutral' tokens to prevent false-positives.



--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Spam by IP-address? Spamassassin with geoiplookup?

2016-09-22 Thread Dave Funk

On Thu, 22 Sep 2016, Thomas Barth wrote:


Hi ho,

a virus was found: Sanesecurity.Malware.26327.JsHeur.UNOFFICIAL

Scanner detecting a virus: ClamAV-clamd

Content type: Virus
Internal reference code for the message is 35123-18/WRf_y9XIIOFq

First upstream SMTP client IP address: [103.230.105.6]
According to a 'Received:' trace, the message apparently originated at:
 [103.230.105.6], [103.230.107.6] unknown [103.230.105.6]


You REALLY should get your DNSBL problem fixed. Once you get DNSBLs 
working it will help alot. That particular IP address hit almost a dozen 
different RBLs here, including some that I use at the SMTP level to 
out-right block incoming traffic (such as cbl.abuseat.org , Spamhaus PBL, 
SBL).



--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: scan an HTML file, possible?

2016-08-03 Thread Dave Funk

On Wed, 3 Aug 2016, Robert Boyl wrote:


Hi, everyone

I have a very nice regex a friend passed me that catches those emails that have 
an HTML attached with a redirect html command to
some malefic website.

He has some tool in Exim that scans text in attachments. But I wanted to use a 
spamassassin rule.

Is there some plugin/way in Spamassassin to scan text of an html attachment?



You can write 'full' rules that will work with raw HTML in recognized html 
attachments. The problem is that SA has business logic that ignores 
non-textural attachments, and that can be fooled by mime-typing.


So if the attachment has a mime-type of "text/html" SA will scan it.
If it has a mime-type of "application/octet-stream" SA will ignore it but 
if the attachment has a filename ending in ".htm" most client programs 
will treat it as HTML and open it as such.


I once wrote a rule to detect such obfuscation but it had too many FPs.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Paragraph Length Limit (new rule)

2016-08-03 Thread Dave Funk

Use a 'full' not 'rawbody' rule.
IE:
  full B_PLL /(?:(?!<\/p>).){999,}<\/p>/msi

Why are you doing a "tflags __B_PLL multiple maxhits=1" ?
If you have "maxhits=1" what's the point of "multiple" at all?

On Wed, 3 Aug 2016, Ruga wrote:


Hello,

We received a new type of spam, twice, and we are not willing to give them a 
third chance.
The body includes a long html paragraph (...) of headlines from the news.

The following works at the command line:
perl -p0e 's/((?:(?!<\/p>).){999,}<\/p>)/-->$1<--/msig' example.eml
perl -n0e  '/((?:(?!<\/p>).){999,}<\/p>)/msig and print "--->$1<---"' 
example.eml

The following SA rule, however, does not work at all:

rawbody __B_PLL /(?:(?!<\/p>).){999,}<\/p>/msi
tflags  __B_PLL multiple maxhits=1
meta   B_PLL __B_PLL
describe     B_PLL Body: Paragraph Length Limit
score          B_PLL 1.0

I would be most grateful if you could spot the but in the above rule.






--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: SA bayes file db permission issue

2016-06-11 Thread Dave Funk

On Sat, 11 Jun 2016, RW wrote:


On Fri, 10 Jun 2016 15:38:44 -0400
Joseph Brennan wrote:



This is a nice test I found:
echo -n I | od -to2 | awk '{ print substr($2,6,1); exit}'

1 little-endian
0 big-endian


I don't see how this can output anything other than 1.

Endianness is about the addressing of bytes within integer words. This
is looking at the ordering of human-readable octal digits displaying
the contents of a single byte.


On big-endian system:

  $ echo -n I | od -to2
  000044400
  001

On little-endian system:

  # echo -n I | od -to2
  000000111
  001

So it works.
It's a single data byte but since the display field is a two byte
object, where within that two byte object does that single byte show up?

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Spamassassin not capturing obvious Spam

2016-05-31 Thread Dave Funk

OK,

So you are testing to see how SA scores artificial mail messages.
However SA is designed to evaluate real mail messages, not botched
fabrications of them, so I don't understand what you are trying to achieve.

You have (either deliberately or unknowingly) omitted the necessary
information that SA needs to perform meaningful network based tests.

If you want to test SA with network based tests explicitly disabled there
are command line (or configuration) options to achieve that. When you use
those options it causes SA to "shift gears" and changes how various
remaining parts are utilized.

So in a way you are crippling SA by withholding info it needs for network
based tests but not telling it that you are doing that so it doesn't
"know" to bring full force of the non-network components to bear.
I'm not surprised that its performance is sub-par in this situation.

What are you trying to achieve with this artificial scenario?

On Mon, 30 May 2016, Shivram Krishnan wrote:


1) The message is indeed fabricated. I had to generate a RFC 2822 mail from 
JSON. I am harvesting SPAM mails from
mailinator.com (public email's). So that is an error in my generation of the 
RFC 2822. I did not change it as
spamassassin did not assign a score.
2) I have set a threshold of -10 to see how spamassassin assigns a score for every mail. 




On Mon, May 30, 2016 at 8:25 PM, Dave Funk  wrote:
  That message is either a fabrication or something from a messed up system.
  There's no sign of an IP address (neither IPv4 nor IPv6) in it.

  There are two identical 'Received:' headers which have '()' where
  there should be at least the IP address of the incoming connection.

  This indicates that the message has either been tampered with or is from 
a postfix system that somebody has
  messed up the configuration.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Spamassassin not capturing obvious Spam

2016-05-30 Thread Dave Funk

That message is either a fabrication or something from a messed up system.
There's no sign of an IP address (neither IPv4 nor IPv6) in it.

There are two identical 'Received:' headers which have '()' where
there should be at least the IP address of the incoming connection.

This indicates that the message has either been tampered with or is from a 
postfix system that somebody has messed up the configuration.



On Mon, 30 May 2016, Shivram Krishnan wrote:


Hey guys,

I am testing spamassassin on a SPAM/HAM corpus of mails. Spamassassin is not 
picking up an obvious
spam like in this case http://pastebin.com/MbNRNFWy .

I have followed the guidelines on 
https://wiki.apache.org/spamassassin/ImproveAccuracy .

Let me know how to catch these type of Spams. It would be interesting to know 
what your spamassassin
assigns the score for this spam.

spamassassin assigned this score -

Content analysis details:   (3.9 points, -10.0 required)

   pts rule name              description
 -- --
 0.8 BAYES_50               BODY: Bayes spam probability is 40 to 60%
                            [score: 0.4292]
 0.0 HTML_MESSAGE           BODY: HTML included in message
 0.7 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts
 0.4 HTML_MIME_NO_HTML_TAG  HTML-only message, but there is no HTML tag
 0.0 UNPARSEABLE_RELAY      Informational: message has unparseable relay lines
 2.0 XPRIO                  Has X-Priority header



Notice that none of the  other body tags are triggered.

Thanks,

Shivram




--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: malware campaign: javascript in ".tgz"

2016-04-21 Thread Dave Funk

On Thu, 21 Apr 2016, Reindl Harald wrote:





[snip..]

Content-Type: application/octet-stream; name="0005500922.tgz"

I wonder how common  octet-stream is with legitimate  .tgz
files


sadly you need to expect "application/octet-stream" for nearly any filetype, 
learned the hard way by doing mime-checks on webservers


+1 for this, similar experience here.

I've seen "application/octet-stream" typing on ".htm" components of mail
messages created by major brand e-mail clients. The lazy authors assume
that the correct file extension is all that is needed.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: HEADER_HOST_IN_BLACKLIST

2016-03-12 Thread Dave Funk

On Sat, 12 Mar 2016, @lbutlr wrote:


Where is the blacklist for HEADER_HOST_IN_BLACKLIST?

I am hitting that on a non-spam mail from email.amctheatres.com


It is the result of somebody putting that hostname in a 'enlist_uri_host'
directive in your local SA configuration.

Look up enlist_uri_host in your SA Conf documentation.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Missed spam, suggestions?

2016-03-11 Thread Dave Funk
 COUNT %OFRULES %OFMAIL %OFSPAM  
%OFHAM

   1 HTML_MESSAGE    16473   9.13   50.51  87.85  90.80
   2 DKIM_SIGNED    13776   7.64   42.24  13.81  
75.93
   3 TXREP   13228   7.33   40.56  91.00  72.91
   4 DKIM_VALID  12962   7.19   39.74  11.93  
71.44
   5 RCVD_IN_DNSWL_NONE    99415.51   30.48   8.08  
  54.79
   6 DKIM_VALID_AU  87114.83   26.71   7.99   48.01
   7 BAYES_00 83904.65   25.72   
1.84   46.24
   8 RCVD_IN_JMF_W   73694.09   22.59   2.54   40.62
   9 RCVD_IN_MSPIKE_WL 67133.72   20.58   4.39  
  37.00
  10BAYES_50 62013.44   19.01  
25.56  34.18


Based upon your stats it looks like you need more Bayes training. Your Bayes 
00/99 hits should rank higher in the rules-fired
stats and BAYES_50 shouldn't be in the top-10 at all.
(of course if you've only been training for a week that would explain it).

For example, here's my top-10 hits (for a one month interval).

TOP SPAM RULES FIRED
--
RANK    RULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
--
  1    T__BOTNET_NOTRUST   114907   60.32   86.81   42.66  0.5755
  2    BAYES_99    109138   32.98   82.45    0.01  0.9998
  3    BAYES_999   104903   31.70   79.25    0.01  0.
  4    HTML_MESSAGE    90850    79.41   68.63   86.59  0.3456
  5    URIBL_BLACK 90845    27.61   68.63    0.27  0.9942
  6    T_QUARANTINE_1  90640    27.40   68.47    0.02  0.9996
  7    URIBL_DBL_SPAM  79152    24.02   59.79    0.17  0.9956
  8    KAM_VERY_BLACK_DBL  74301    22.45   56.13    0.00  1.
  9    L_FROM_SPAMMER1k    73667    22.26   55.65    0.00  1.
 10    T__RECEIVED_1   72413    42.60   54.70   34.54  0.5135

OP HAM RULES FIRED
--
RANK    RULE NAME   COUNT  %OFMAIL %OFSPAM  %OFHAM  S/O
--
  1    BAYES_00    182674   56.03    2.11   91.97  0.0150
  2    HTML_MESSAGE    171992   79.41   68.63   86.59  0.3456
  3    SPF_PASS    136623   63.08   54.52   68.78  0.3457
  4    T_RP_MATCHES_RCVD   130879   53.75   35.54   65.89  0.2644
  5    T__RECEIVED_2   125492   53.76   39.62   63.18  0.2947
  6    DKIM_SIGNED 114808   38.57    9.72   57.80  0.1008
  7    DKIM_VALID  105385   34.70    7.16   53.06  0.0825
  8    RCVD_IN_DNSWL_NONE  92951    29.90    4.56   46.80  0.0609
  9    T__BOTNET_NOTRUST   84741    60.32   86.81   42.66  0.5755
 10    KHOP_RCVD_TRUST 84623    26.44    2.19   42.60  0.0331

Note how highly BAYES 00/99 ranked. What you don't see is that BAYES_50 is way 
down in the mud (below 50 rank).

BTW, this is with a Bayes that is mostly fed via auto-learning. I occasionally
hand feed corner cases that get mis-classified (usually things like phishes, or 
conference announcments that can look shakey).


--
Dave Funk  University of Iowa
    College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_admin    Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Robert Chalmers
rob...@chalmers.com.au  Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  
XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. 
Lower Bay


Robert Chalmers
rob...@chalmers.com.au  Quantum Radio: http://tinyurl.com/lwwddov
Mac mini 6.2 - 2012, Intel Core i7,2.3 GHz, Memory:16 GB. El-Capitan 10.11.  
XCode 7.2.1
2TB: Drive 0:HGST HTS721010A9E630. Upper bay. Drive 1:ST1000LM024 HN-M101MBB. 
Lower Bay








--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: Interesting rule combo results

2016-03-09 Thread Dave Funk

On Tue, 8 Mar 2016, Marc Perkel wrote:


This is the for what it's worth department.

I've generated the following rules combination lists.

The ham list are rule combinations  sorted by the number of ham hits that 
have 0 spam hits.
The spam list are rule combinations  sorted by the number of spam hits that 
have 0 ham hits.


There are some of my personal rules mixed in.

Just posting this just to see if anyone sees any value in this.

SPAM RULES:

11648 HTML_MESSAGE RAZOR2_CF_RANGE_51_100 SUBJ_GROUP
11308 HTML_MESSAGE RAZOR2_CF_RANGE_E8_51_100 SUBJ_GROUP
11212 RAZOR2_CF_RANGE_51_100 RAZOR2_CF_RANGE_E8_51_100 SUBJ_GROUP
10749 RAZOR2_CF_RANGE_51_100 RAZOR2_CHECK SUBJ_GROUP
10646 RAZOR2_CF_RANGE_E8_51_100 RAZOR2_CHECK SUBJ_GROUP
 5042 DKIM_VALID MIME_HTML_ONLY MISSING_DATE
 5024 DKIM_VALID_AU MIME_HTML_ONLY MISSING_DATE

[snip..]


HAM RULES:

   132983 DKIM_SIGNED MAILTO_LINK RDNS_DYNAMIC
   132558 DKIM_VALID MAILTO_LINK RDNS_DYNAMIC
   131916 DKIM_VALID_AU MAILTO_LINK RDNS_DYNAMIC

[snip..]

80056 HTML_MESSAGE
78472 DKIM_SIGNED MAILTO_LINK UNPARSEABLE_RELAY
77994 DKIM_VALID MAILTO_LINK UNPARSEABLE_RELAY
77635 DKIM_VALID_AU MAILTO_LINK UNPARSEABLE_RELAY
76959 HTML_MESSAGE RDNS_DYNAMIC UNPARSEABLE_RELAY
72949 MAILTO_LINK RDNS_DYNAMIC UNPARSEABLE_RELAY
59189 DKIM_SIGNED
56792 DKIM_VALID

[snip..]

Marc,

Maybe I'm misunderstanding your list but it looks like you've got 
HTML_MESSAGE by itself in the HAM RULES (IE zero spam hits on HTML_MESSAGE)

but you've also got a rule combo of HTML_MESSAGE RAZOR2_CF_RANGE_51_100 
SUBJ_GROUP
as the top SPAM RULES (which implies that there is SPAM that hits HTML_MESSAGE 
too).

Similar situation for DKIM_SIGNED & DKIM_VALID

Also how can you have 132983 hits on the combo of DKIM_SIGNED MAILTO_LINK 
RDNS_DYNAMIC
but only 59189 hits on DKIM_SIGNED by itself?

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: URIBL/DNSBL from a database

2016-02-13 Thread Dave Funk

On Sat, 13 Feb 2016, Alex wrote:


I've now got rbldnsd implemented. I've also known for a while it's
faster/better than bind, but bind has always been in place.

I have rbldnsd running on port 530, alongside bind on 53. How do I
specify a urirhsbl in spamassassin to query the DNS server running on
530 instead of 53?


One way to do this is to set up a "forward only" zone in your bind config.

For example, assume you're authoritative for "example.com" and you've got
your rbldnsd set up to serve up your data as zone "mybl.example.com" and
it's bound to 192.168.124.23/530

Then in your bind config file create a zone:

zone "mybl.example.com" {
type forward;
forward only;
forwarders {
192.168.124.23 port 530;
};
};

Then when your clients (spamd or regular dns tools) query
"blah.com.mybl.example.com" it will hit your bind and then
get passed on to your rbldnsd for an answer.

If you want to hide that resource from the world put that zone
in a private 'view' in your bind. You could control access via an
ACL but by putting it inside a private view they'll never even see it
to try pounding on it.

To provide fault tolerance, you can set up rbldnsd's on multiple
machines and put multiple addresses in that 'forwarders' stanza.
You will need to put that zone definition in your primary bind and
each secondary.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Question about spam report header

2016-02-02 Thread Dave Funk
You can do that but it requires editing all your rule files, altho then 
you see those matches in all your reports.


If you just want to test one particular message, just use the -D option to 
spamassassin and grep for ' got hit: '


Mar 11 21:51:44.203 [5074] dbg: rules: ran header rule __MIME_VERSION ==> got hit: 
""
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __TO_HEADER_EXISTS ==> got hit: 
"<"
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __TOCC_EXISTS ==> got hit: 
""
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_UPS2 ==> got hit: 
"negative match"
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_JURY3 ==> got hit: 
"negative match"
Mar 11 21:51:44.205 [5074] dbg: rules: ran header rule __HAS_FROM ==> got hit: 
""

(Yes, Marc, you probably already know this, this is for the other people 
who might be following this thread ;)


On Tue, 2 Feb 2016, Marc Perkel wrote:


Never mind 

I found that if I change __ to T_ that it does what I want.


On 02/02/16 18:05, Marc Perkel wrote:


On 02/02/16 17:55, Marc Perkel wrote:
Normally SA creates a header that has a list of the names of rules that 
matched. It skips the listing of hidden rules that start with __ .


Is there a command where I can easily tell SA to include the hidden rules 
in the report in the headers so I can see all of it?




I'm also - I suppose asking it to list rules that match that produce no 
scores.


body  __LATE_RICH_RELATIVE /\blate 
.{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i


body  __CT_CLICK   /\b(click(ing)? 
(here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i


body  __BENEFICIARY/\bbeneficiary\b/i

body  __CT_BEGGER  /\b(kind assist[ae]nce|feed my 
family|need (of )?your help|donat(e|ion))\b/i


body  __CT_CONTACT /\b((contact(?:ing) you|contact 
(information|me|email|number|us)|your contact))|to (inform|email) you/i


body  __CT_REPLY_TO_ME /\b(reply to me|please reply|my email 
address|private email|contact me|prompt response|reply from you|hearing 
from you|assist me)/i


body  __CT_DYING   /\b(diagnosed with|months to live|dying 
of|transplant)\b/i


body  __CT_UNITED_NATIONS  /\bUnited Nations?\b/i

meta  __CT_STRANGERCT_MY_NAME_IS || CT_DEAR_FRIEND || 
CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE


meta  __CT_MONEY   CT_TRANSFER_MONEY || CT_THE_SUM_OF || 
CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD || FUZZY_MILLION || 
GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS || US_DOLLARS_2 || ADVA$


meta  __CT_VICTIM  __BENEFICIARY || CT_LATE_PRESIDENT || 
CT_LATE_RICH_RELATIVE || __CT_DYING


meta  __CT_FORMFILL_THIS_FORM || FILL_THIS_FORM_LONG || 
T_FILL_THIS_FORM_SHORT


meta  __CT_CONFIDENTIALCT_PRIVATE_EMAIL || CT_PRIVATE_PHONE || 
CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2


meta  __CT_NOW CT_ACT_NOW || CT_DO_IT_TODAY || 
CT_URGENT_RESPOND


meta  CT_GOD_BENEFICIARY   __CT_GOD && __CT_VICTIM
describe  CT_GOD_BENEFICIARY   God and Beneficiary
score CT_GOD_BENEFICIARY   4

meta  CT_GOD_BEGGER__CT_GOD && __CT_BEGGER
describe  CT_GOD_BEGGERBegging in Religious Language
score CT_GOD_BEGGER3







--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: OUTPUT OF SPAMASSASSIN

2016-01-24 Thread Dave Funk

On Sun, 24 Jan 2016, Reindl Harald wrote:




Am 24.01.2016 um 20:45 schrieb Shawn Bakhtiar:

On Jan 24, 2016, at 11:29 AM, Martin Gregorie  wrote:

On Mon, 2016-01-25 at 00:07 +0530, Sarang Shrivastava wrote:

I am just a newbie who has started using SA. Someone on the mailing
list suggested me to use -D option. So if this option is for
debugging then how do we classify it ?


You don't classify it: that's SA's job. It only scores messages and
sets the Yes/No flag before adding the X-Spam-* headers to the message.
Nothing else. What you do with mail that SA has classified as spam is
the responsibility of your additional software and/or your users.

[snip..]


* the point is that he is analyzing *local* files
* so he needs to pass eml files to spamc/spamassassin
* SA adds a header "X-Spam-Flag: Yes" in case of it reached spam-score
* that output needs to be parsed
* that's it


Simpler yet, get spamd running and just use "spamc -c < mail.eml"
It emits a score and sets the  exit code.
No "parsing" needed, just test the exit code.

EG, suppose I have two messages, one known ham "ham.eml" and one known 
spam "spam.eml"


Then:

  if (spamc -c < spam.eml ) ; then
 echo "is ham"
  else
echo "is spam"
  fi

will execute the 'echo "is spam"' clause
and if you feed it the ham.eml will execute the 'echo "is ham"' clause.
( this presupposes a bash shell varient, coding for other shell types is 
left as an exercise for the reader. ;)


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Help with RegEx Rule

2015-09-19 Thread Dave Funk

On Sun, 20 Sep 2015, AK wrote:

[..snip..]
Still no joy after removal.  However, at least the rule now hits if I 
replace:


/(^\.\n){5,}/

with

/(^\.\n)*/

But that looks like it might bring about some FPs.  Any other suggestions?


Do you realize that rule will -always- fire on -any- message?
The '*' repeat operator is "zero or more" instances.
So that pattern degenerates to // which will match everything.

Guaranteed FP generator.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Help with RegEx Rule

2015-09-19 Thread Dave Funk

On Sun, 20 Sep 2015, AK wrote:


Hi all.

I'm getting hit with lots of JUNK mail that has multiple lines with just a 
'.' on several lines [0].  Most of the JUNK email has at least 5 and at most 
10 lines (so far) with just this '.' character somewhere in the middle of the 
message.


I've copied the message source to RegexBuddy [1] and have been able to come 
up with a regex that matches what I want using the Perl 5.20 engine:


(^\.\n){5,}

However, adding this rule to /etc/spamassassin/local.cf doesn't hit at all 
when I run it against my test message as follows:


= Start Rule Block =
rawbody __MANY_PERIODS_1 ALL =~ /(^\.\n){5,}/
meta MANY_PERIODS __MANY_PERIODS_1
score MANY_PERIODS 2.0
describe MANY_PERIODS JUNK mail with several lines that contain single dot
= End Rule Block =

= Begin Test Command =
spamassassin -L -t test.msg
= End Test Command =


Please help me understand what I'm doing wrong as this is my first attempt at 
creating a rule.  Previously I've just copied and pasted what I've found here 
in the forums, but this time I'm trying to do it myself but failing.



Regards,
ak.


SA does some interesting pre-processing on mail messages before applying 
rules, so you need to understand that.


Try this:

 rawbody T__LOCAL_MANY_PERIODS/\n(?:\.\n){5}?/
 describe T__LOCAL_MANY_PERIODS   Many lines with just a single "dot"

Notes:
1) Due to SA pre-processing collapsing body into one long line, cannot 
match on '^' repeatedly, need to look for '\n' as line break indicator.

Find start of a line and then following repeats of ".\n"
2) use '(?:' as grouping optimization unless you care about capture.
3) for terminal match clause use '{5}' not '{5,}' as we're done as soon
as we see at least 5 matches, don't care if there are more.
4) use "non-greedy" match quantifier '}?' look for first hit on that 
pattern and don't try to go for more.


Un-optimised pattern: /\n(\.\n){5}/

Note use of "testing" rule name format, that "T_". remove the leading 'T' 
to make it into a silent rule for combining with metas.


Personal convention; I interpolate '_LOCAL_' ( or '_L_') in locally 
created rule names to distinguish them for debugging. And then when things 
don't work as expected (EG: FPs) it helps to determine if the problem is 
self-inflicted.


Final note; now that we've discussed this spam sign, it will probably 
become useless as spammers follow this list and mutate their crap 
accordingly to dodge our rules. ;(


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: URIBL_BLOCKED while using local BIND

2015-09-15 Thread Dave Funk

However you did not empty your ISP's dns server cache.
That 2 msec response time is from his cache, the 543 msec for 
your server is when it's not in your server's cache.

So you're not making a fair comparison.

A response from a cache is always going to be faster, that's why people 
use caching servers.
However with everybody & his cat using your ISP's server it gets query 
blocked and thus is caching the bad (blocked) response.


So either you get bad data fast or good data slowly.

Once you get a second spam with similar contents, queries for that copy 
will be in your cache and be fast.


Given that a modern SA parallelizes DNS queries a somewhat slow DNS 
response (hundreds of Msecs) won't have too much overall affect on the 
spam processing time.


On Tue, 15 Sep 2015, Marc Richter wrote:


Yes

Am 15.09.2015 um 13:30 schrieb Axb:

On 09/15/2015 01:23 PM, Marc Richter wrote:

Also, you shouldn't make assumptions without measuring something:

1. without forwarding:

;; Query time: 543 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)

2. with forwarding to my ISP's servers:

;; Query time: 2 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)

That's 271 times faster than root-servers's lookup.


did you EMPTY cache after each query?








--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Bayes Filtering

2015-08-02 Thread Dave Funk

On Sun, 2 Aug 2015, Christian Jaeger wrote:


On August 2, 2015 6:40:10 PM CEST, Reindl Harald  wrote:

no idea what you are talking about by saying
"I can't find anything about this in the docs"


I'm talking about the bundled docs. The man / perldoc pages of 
Mail::SpamAssassin::Plugin::Bayes / Mail::SpamAssassin::*Bayes* and the default 
config files. That's where I expected this info to be. It's something simple 
and basic, i.e. something that the writer of the software can foresee the need 
for documentation, so it makes sense that it's in the same files that the 
programmers wrote. That's where I start looking. That's where qpsmtpd, which 
I'm configuring around the same time, has its basic docs.

Ch.


In the man page for the spamassasin config file there is a paragraph:

   bayes_min_ham_num (Default: 200)
   bayes_min_spam_num   (Default: 200)
   To be accurate, the Bayes system does not activate until a
   certain number of ham (non-spam) and spam have been learned.
   The default is 200 of each ham and spam, but you can tune
   these up or down with these two settings.

You might argue about the clarity, but the info is there.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Classifying mail as unsolicited

2015-07-07 Thread Dave Funk

On Mon, 6 Jul 2015, Alex wrote:


Hi,

We have a system with a few hundred users, many of which forward their
mail off the server to their gmail or yahoo account. Lately I've
started to notice quite a few messages are being tagged by gmail and
delayed being received as unsolicited. I know the KAM rules contain a
marketing rule, and razor helps too, but too many of these marketing
messages are not being tagged.

I'm referring to warnings such as this:

Jul  6 22:54:20 bwipropemail postfix/smtp[25057]: C09F4885EA2BC:
to=<44...@gmail.com>, orig_to=<44...@example.com>,
relay=alt1.gmail-smtp-in.l.google.com[173.194.208.26]:25, delay=38223,
delays=38220/1.3/1/0.22, dsn=4.7.0, status=deferred (host
alt1.gmail-smtp-in.l.google.com[173.194.208.26] said: 421-4.7.0
[66.XXX.XXX.100  15] Our system has detected an unusual rate of
421-4.7.0 unsolicited mail originating from your IP address. To
protect our 421-4.7.0 users from spam, mail sent from your IP address
has been temporarily 421-4.7.0 rate limited. Please visit 421-4.7.0
https://support.google.com/mail/answer/81126 to review our Bulk Email
421 4.7.0 Senders Guidelines. 5si23309629qks.82 - gsmtp (in reply to
end of DATA command))


Yes, gmail does that to almost anything they decide is relayed spam.



Here is an example message:

http://pastebin.com/kaD3AQMz


It came from ymlpsv.net, black list them (and their other names such as
ymlpsv.com, ymlpsrv.net, ymlpserver.net, ymlpsrv.com) unless one of your
clients -really- wants crap from them, then selective whitelist.

They are a spammy MSP. I regularly find garbage from them in my spamtraps.


I realize bayes may be a problem on this one, but do you have any
suggestions for blocking these more effectively before they're
forwarded on to gmail?


As others have alluded to, forwarding opens up a while can-of-worms
but forwarding to gmail is the most problematic.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: local.cf, user_prefs etc

2015-05-21 Thread Dave Funk

On Thu, 21 May 2015, Dmitry Baronov wrote:


Hello folks!

I use 3.4.1 freebsd version with compiled rules.

Please, give me advice how I could use local config file to override 
downloaded default values?


All my attemps were unsuccessful.

I placed local.cf and user_prefs files in /root/spamassassin 
/etc/mail/spamassassin /usr/local/etc/mail/spamassassin
/usr/local/share/spamassassin - no way to replace default values like 
blacklist_from  or required_score.


I need help :)

Rgds,

db


First, how are you using spamassassin?

Are you using the 'spamd' daemon and feeding it spam via "spamc" (from a 
procmail receipt or a postfix filter, or a milter)?

Are you using the spamassassin program itself from procmail?
Are you using some kind of dedicated mail filtering package such as 
mimedefang or amavis which instantiates an instance of spamassassin within 
its own process via the spamassassin APIs?


The first two methods use the standard spamassassin config files, the last
one may ignore standard spamassassin config files and use its own.

For the first two you need to determine which config files are being used
as it's possible that your SA kit was built with non-standard internal settings.
Invoke spamassassin with the "--lint -D" flags and it will tell you which
config files it's using. The 'local' variants of the config files that it
says it's reading are the ones you want to modify.

For the last method you'll have to consult the relevant documentation.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Rejecting without backscatter (was Re: Spamassassin not catching spam (Follow-up))

2015-03-26 Thread Dave Funk

On Thu, 26 Mar 2015, Kris Deugau wrote:


David F. Skoll wrote:

On Thu, 26 Mar 2015 15:05:06 +0100
Reindl Harald  wrote:


* spamass-milter -r 8.0
* messages above 8.0 are *rejected*


Silently?  Or do you generate an NDR?  I'm genuinely curious as to how you:

1) Accept mail for some recipients

2) Reject mail for others

3) Without generating backscatter

4) Given that the messages are sent in the same SMTP session with
   multiple RCPTs and only one DATA.


For those of you still a little puzzled, here's an example of what David
is asking about.  In the following SMTP transaction, how to you reject
the message for receip1, while accepting the message for recip2?

$ telnet mx.example.org 25
<< 220 example.org, talk to me

helo sending.server

<< 250 Hello, friend!

mail from:imma.spam...@example.com

<< 250 OK, send this to who?

rcpt to:rec...@example.org

<< 250 OK

rcpt to:rec...@example.org

<< 250 OK

DATA

<< 354 Now for the message


.


At this point you have one message, scoring > 8 points.  Recipient 1
absolutely requires all mail to be delivered to their Inbox, with a
Subject tag in the case of mail considered spam.  Recipient 2 wants mail
scoring > 8 points to be rejected.

What SMTP response to you send?  You can only send one response, since
you only have one message, but you have two recipients with conflicting
filter policies.


At that stage you're stuck, there is no way out of that box.

To achieve the desired results you need business logic in your pre-queue
/ milter filter to do a triage during the 'rcpt' stage.

You need a database of recipient classes to indicate whether the recipient
is a spam-lover or a spam-hater.
At the first recipient you look up that address and set a state variable
for that session (call it love-hate). As each additional recipient comes in
you compare his class against the love-hate setting for the current
session. If they are compatible you respond with a 250, if not with a 452
(or other 45* type reply). This way the sender is responsible for queuing
those recipients and trying again in another SMTP session.
Then all the recipients in one session can be treated equally WRT the
handling of reject/accept based upon some future state (EG spammyness
of the message).

That logic can be extended to more than just spam love/hate status,
just need some kind of business logic that sets the compatibility
matrix at the beginning of a session and 452's any recipient that
isn't compatible.

Note that Gmail is already doing something like this (the "multiple
destinations not supported in one transaction" status).

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Handling very large messages (was Re: Which milter do you prefer?)

2015-03-15 Thread Dave Funk

On Sun, 15 Mar 2015, Reindl Harald wrote:




Am 15.03.2015 um 19:15 schrieb Axb:

On 03/15/2015 07:09 PM, Reindl Harald wrote:



[snip..]

IMO, deciding what chunk of a msg should be scanned should be managed by
the glue and not by SA.


true but if the glue (spamass-milter) would truncate the message it
passes to spamc it would get back that truncated message with the added
headers (which are used to decide reject or pass) and so finally
*deliver* the truncated version


then spamass-milter is the wrong choice


how else should it work?

it hardly can invent the report-headers SA adds by itself which needs to land 
in the final message, spamc/spamd are doing the message work and the milter 
is just the glue to bring the MTA and SA together


However that glue can be intelligent and contain business logic.

If the author of the milter knows what they are doing (and cares) this is
very straightforward thing to do (I know because I did it with 
milterassassin).

In the milter you must take an explicit extra step if you want to mess
with the body of the message (smfi_replacebody). It's actually easier to
just add/replace headers (smfi_addheader/smfi_chgheader) then it is
to mess with the body. (not to mention faster & more efficient).

So logic is; milter receives -copy- of message from sendmail, milter
passes 'REPORT' command & (optionally truncated) message to spamd, gets 
back a headers-only report. milter then tells sendmail to add the 
new/modified headers and doesn't mess with the body.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: whitelist_from_rcvd not working, WAIDW

2015-02-28 Thread Dave Funk

On Fri, 27 Feb 2015, Ian Zimmerman wrote:


Header of test message, massaged for privacy, is here:

http://pastebin.com/EV6g15aN

I have this in user_prefs:

trusted_networks 198.1.2.3/32

[...lots snipped...]

whitelist_from_rcvd *@wetransfer.com *.wetransfer.com

Why is the whitelist not firing?


whitelist_from_rcvd can be a bit fragile because it depends upon
multiple factors (trust chain, full-circle-DNS) working correctly.

First thing, that second parameter is not an address but part
of a DNS name, so use 'wetransfer.com' instead of that *.wet...

second thing, check to see if your trust chain is working as you
expect. whitelist_from_rcvd is applied at the point of the
first trusted relay (IE where the last untrusted hands the
message to the first trusted relay). Add the 'X-Spam-Relays-Trusted'
and 'X-Spam-Relays-Untrusted' pseduo headers to your report
to see if things are working as expected.

Note that a DNS fubar (even temporary) will break whitelist_from_rcvd.
Also if the sender changes MSP, it will break thus is a maintanance
head-ache.

I see that message has a valid DKIM signature, why not use
whitelist_auth. Same goodness with less head-aches.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: no BAYES checking

2015-02-25 Thread Dave Funk

On Wed, 25 Feb 2015, James wrote:


I don't think I have the Bayesian filter working.

This is some spam that wasn't marked as spam, shouldn't one of the tests be 
BAYES_00?

X-Spam-Status: No, score=4.5 required=5.0 tests=FREEMAIL_FROM,FREEMAIL_REPLYTO,
FSL_MY_NAME_IS,HTML_MESSAGE,RDNS_DYNAMIC,T_OBFU_JPG_ATTACH autolearn=no
version=3.3.2

$ sudo sa-learn --username=debian-spamd --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0   5902  0  non-token data: nspam
0.000  0   4985  0  non-token data: nham
0.000  0 422427  0  non-token data: ntokens
0.000  0 1159486049  0  non-token data: oldest atime
0.000  0 1424827990  0  non-token data: newest atime
0.000  0 1424843976  0  non-token data: last journal sync atime
0.000  0 1424830068  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime delta
0.000  0  0  0  non-token data: last expire reduction 
count

Doesn't that show I have 5902 spam and 2462 ham messages?

/etc/spamassassin/local.cf
use_bayes 1
bayes_auto_learn 1

$ sudo -u debian-spamd spamassassin -D --lint 2>t
$ less t
$ grep bayes t
Feb 25 21:07:47.606 [27839] dbg: config: fixed relative path: 
/var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Feb 25 21:07:47.607 [27839] dbg: config: using 
"/var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf" for 
included file
Feb 25 21:07:47.607 [27839] dbg: config: read file 
/var/lib/spamassassin/3.003002/updates_spamassassin_org/23_bayes.cf
Feb 25 21:07:55.270 [27839] dbg: bayes: learner_new 
self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x1de4868), 
bayes_store_module=Mail::SpamAssassin::BayesStore::DBM
Feb 25 21:07:55.353 [27839] dbg: bayes: learner_new: got 
store=Mail::SpamAssassin::BayesStore::DBM=HASH(0x230eb58)
Feb 25 21:07:55.356 [27839] dbg: bayes: tie-ing to DB file R/O 
/var/lib/spamassassin/.spamassassin/bayes_toks
Feb 25 21:07:55.359 [27839] dbg: bayes: tie-ing to DB file R/O 
/var/lib/spamassassin/.spamassassin/bayes_seen
Feb 25 21:07:55.363 [27839] dbg: bayes: found bayes db version 3
Feb 25 21:07:55.365 [27839] dbg: bayes: DB journal sync: last sync: 0
Feb 25 21:07:55.366 [27839] dbg: bayes: not available for scanning, only 0 ham(s) 
in bayes DB < 200
Feb 25 21:07:55.367 [27839] dbg: bayes: untie-ing
Feb 25 21:07:55.379 [27839] dbg: bayes: tie-ing to DB file R/O 
/var/lib/spamassassin/.spamassassin/bayes_toks
Feb 25 21:07:55.382 [27839] dbg: bayes: tie-ing to DB file R/O 
/var/lib/spamassassin/.spamassassin/bayes_seen
Feb 25 21:07:55.385 [27839] dbg: bayes: found bayes db version 3
Feb 25 21:07:55.386 [27839] dbg: bayes: DB journal sync: last sync: 0
Feb 25 21:07:55.388 [27839] dbg: bayes: not available for scanning, only 0 ham(s) 
in bayes DB < 200
Feb 25 21:07:55.388 [27839] dbg: bayes: untie-ing

Why does it say not enough ham?


It looks like you either have a permissions problem or a confusion problem.
Your run of 'sa-learn --dump magic' is looking at some Bayes which has
enough ham/spam but what ever your spamassasin is looking at doesn't.

Your 'sudo' isn't running that sa-learn --dump magic as UID 'debian-spamd'
It's running it as root but telling sa-learn to emulate user 'debian-spamd'
so there could be a permissions problem.
Try running sa-learn in the same way that you're running spamassasin:

 $ sudo -u debian-spamd sa-learn --dump magic
and see what you get.

Other possibility is that sa-learn is looking at a different bayes
database. Try running that "sa-learn --dump magic" with the "-D" option
to see what bayes database it's looking at.



--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Recent spate of Malicious VB attachments II

2015-02-19 Thread Dave Funk

On Thu, 19 Feb 2015, David F. Skoll wrote:


On Thu, 19 Feb 2015 07:46:16 -0600
Chad M Stewart  wrote:


I use amavis-new and block based on file type.  My users should never
get legit executables via email, so they are sent to a quarantine.


Unfortunately, we're finding those simple-minded rules are running out
of gas. :(  We've seen a zip file containing an Excel spreadsheet
with a macro virus in it.  ClamAV is essentially useless at detecting
viruses, so it's a real problem... any ideas?


I thought that ClamAV knew how to unpack zip/rar/tar/gzip/etc...
and scan the cruft inside them.

Are you saying that doesn't work or are you saying that the malware is
mutating fast enough that the ClamAV signatures aren't keeping up with it?
If the latter case, is there -any- AV kit that is?
Are the Sanesecurity add-in ClamAV signatures helpful?

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Recent spate of Malicious VB attachments II

2015-02-19 Thread Dave Funk

On Thu, 19 Feb 2015, Reindl Harald wrote:

well, that can you achieve directly on the MTA but that won't help in case of 
"emails containing MS office attachments with a Malicious VB script"


cat /etc/postfix/mime_header_checks.cf
/^Content-(?:Disposition|Type):(?:.*?;)? \s*(?:file)?name \s* = 
\s*"?(.*?(\.|=2E)(386|acm|ade|adp|awx|ax|bas|bat|bin|cdf|chm|class|cmd|cnv|com|cpl|crt|csh|dll|dlo|drv|exe|hlp|hta|inf|ins|isp|jar|jse|lnk|mde|mdt|mdw|msc|msi|msp|mst|nws|ocx|ops|pcd|pif|pl|prf|rar|reg|scf|scr|script|sct|sh|shb|shm|shs|so|sys|tlb|vb|vbe|vbs|vbx|vxd|wiz|wll|wpc|wsc|wsf|wsh))(?:\?=)?"?\s*(;|$)/x 
REJECT Attachment Blocked (Executables And RAR-Files Not Allowed) "$1"


(.rar because ClamAV can't scan the content on Fedora)


Is that a politically inspired limitation? If you build ClamAV from source
it can scan RAR.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: regex: chars to escape bsides @

2015-01-03 Thread Dave Funk

On Sat, 3 Jan 2015, Reindl Harald wrote:

by writing some custom rules like below i found out that @ needs to be 
esacped additionally to http://php.net/manual/de/function.preg-quote.php


are there other chars which needs special handling?

headerCUST_MANY_SPAM_TO  X-Local-Envelope-To =~ 
/^(\)$/i

score CUST_MANY_SPAM_TO  -4.0
describe  CUST_MANY_SPAM_TO  Custom Scoring


Umm, SA is written in Perl, not PHP. So you should look at Perl
regex documentation, not PHP docs.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Gmail password reset FPs

2014-12-17 Thread Dave Funk

On Wed, 17 Dec 2014, Joe Quinn wrote:

We've been having password reset emails marked as spam by Gmail. We've tried 
rephrasing the email body/subject/from email, to no avail. We've even tried 
registering as a bulk sender 
(https://support.google.com/mail/contact/bulk_send_new?rd=1) and googling for 
anyone having similar issues. Has anyone else dealt with this before and 
managed to get it fixed?


I see that you've got SPF set up for your domain, do you have DKIM signing
enabled too? What about TLS transport on your outgoing MTA?
Not sure they'll make any difference but those are things that I've done
here to help improve deliverability.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Honeypot email addresses

2014-12-04 Thread Dave Funk

On Thu, 4 Dec 2014, Noel Butler wrote:



On 04/12/2014 00:54, Christian Grunfeld wrote:

  "It would be very rare, and if so you would ever more rare CC the 
entire list of addresses on your spam message -
  sure this was a lot more common in years gone by, but I've not seen any 
such evidence of it in almost 10 years, and if
  you did, well, that's not my problem, its the problem of your provider 
who obviously doesn't care enough to educate its
  users of the dangers of spam, period.."



  lol ! ! ! is it possible to educate users against spam?

  if that were the case this list would not be needed and we would be free 
ourselves from reading your posts, period !



you must be doing it wrong, the users of today are far wiser than they were 5 
years ago, even my almost 80yo dad knows to handle
spam, although its hard to do, get your users to *read* their welcome emails, 
and dont have a lawyer write the stuff, write it so an
10yo kid can understand it, its also rare spam gets passed SA anyway with our 
myriad of custom rules, and we block at MTA level from
multiple DNSBL's amongst many other milter tricks which I'm not going into in a 
public forum :)

So educate them well, and let SA do its job, and we wouldnt need to read your 
posts either.


I have to agree with Dave, Christian, et-all. It's not frequent but not
rare to see a reply-all "Take me off this list!!!".

Even if you've got the smartest, best educated users who will never make 
that mistake and a totally perfect spam filtering system that never has a 
FN there are other people/systems in the world which may be on that 
"shotgun" spam recpient list which may be less than perfect.



--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Honeypot email addresses

2014-11-22 Thread Dave Funk

Another way to seed spamtrap addresses is to make up some and
then feed them into "unsubscribe" links in spam sent to regular
users. I've got some of those I started that way 15 years ago
and they're still going strong.


On Sat, 22 Nov 2014, Ted Mittelstaedt wrote:


That's a lot of work, there's a much easier way

Just search your /var/log/maillog for user unknown messages, and
create email addresses for the unknown users which are showing up
multiple times over multiple days.  It's a great trick because it gets 
spammers who already have email addresses in their

spamlists and who are too lazy to remove them when they get a
user unknown message from the mailserver.

I have a pretty old domain - I've seen user unknown messages for
users who cancelled mailboxes on the domain over a decade ago.  I figure
10 years of getting user unknown messages is long enough for any real
humans and for legitimate mailing lists to remove those entries.

Ted

On 11/21/2014 8:10 AM, Joe Quinn wrote:

We are setting up some honeypot email addresses, and were wondering if
anyone here had tips on how to include those addresses on webpages and
other places.

We're currently going with a pretty simple 
HTML comment. Is that too obvious? Should we put it into a CSS invisible
div as well? Any other ideas?





--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: URIBL_RHS_DOB #fail

2014-11-09 Thread Dave Funk

On Sun, 9 Nov 2014, Axb wrote:


On 11/09/2014 09:51 PM, Alex Regan wrote:

Hi guys,

One of my user's hotel reservations almost got tagged incorrectly:

*  1.5 URIBL_RHS_DOB Contains an URI of a new domain (Day Old Bread)
 *  [URIs: bestwestern.com]

I looked around for a place to report an FP, but also thought everyone
else should know about this, since it's so obviously incorrect.

Their whois looks like the record was updated on the 31st. Not exactly a
day ago, but could that even have something to do with it?


DOB owner has been notifed.


I think DOB was having a "bad hair day" this morning. I saw a number of
FP hits on DOB for stuff that hadn't changed in years (EG amtrak.com ).
It looks better now.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: yahoo rcvd bug?

2014-10-20 Thread Dave Funk

On Mon, 20 Oct 2014, Quinn Comendant wrote:


I'm getting FORGED_YAHOO_RCVD false positives for messages with yahoo received 
headers that do not match the search pattern defined in 
check_for_forged_yahoo_received_headers(). I'm using SpamAssassin 3.3.2 with 
latest rules as per `sa-update` rule channels `sought.rules.yerp.org` and 
`updates.spamassassin.org`.

The spamassassin rule that is firing:

*  1.6 FORGED_YAHOO_RCVD 'From' yahoo.com does not match 'Received' 
headers

The received-by header in question:

Received: from unknown (HELO nm46-vm10.bullet.mail.bf1.yahoo.com) 
(216.109.114.203)

Full mail headers available at https://cloudup.com/cbmG8tJF71k

And finally here's the `check_for_forged_yahoo_received_headers` function that 
parses this, which doesn't contain the correct regex for this hostname:

[snip..]

 return 1;
   }


You have two different rules that have fired there (FORGED_YAHOO_RCVD &
RDNS_NONE) because your MTA was not able to resolve that IP address to
its registered domain name.
The SA code correctly parsed the info that your MTA gave it, it's just
that info was incorrect either due to local DNS issues or a network issue.

Then because you (or somebody configuring your SA) has lowered the spam
threshold from 5.0 to 3.0 it caused a FP on this message.

I don't think that it is valid to delcare a bug in SA because of an issue 
local to your system. (problematic MTA/DNS & local config choices).


I see that you also have a hit on URIBL_BLOCKED which tends to indicate
that you have local DNS issues that should be addressed.

suggestions:
1) work on improving your DNS system
2) put the spam threshold back to default to reduce FPs triggered by DNS 
issues.

3) create a meta rule that takes the DKIM_VALID detection to nullify the
 effect of that FORGED_YAHOO_RCVD (in case you cannot get your DNS to work
 correctly).

If you lowered that spam threshold because of too many FNs, I think that
getting the DNS fixed so RBL tests work will take care of that too.

There have been plenty of posts to this list about URIBL_BLOCKED and how
to fix it.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: .link TLD spammer haven?

2014-10-13 Thread Dave Funk

On Mon, 13 Oct 2014, Philip Prindeville wrote:


Every connection I’ve gotten from a hostname resolving to *.link or saying helo 
*.link has been spam (I block the connections with MIMEDefang).

Has anyone actually seen a legitimate email from a host in the .link TLD?

I’ve seen (last week alone):

bgo.blc-onlineconsumer140.link
ratio.allgiftcardsonlinefriendly.link
ratio.autodealersstarted.link

[snip..]


Is it worth having that triggers on the relay’s hostname being *.link?

Also, I noticed that every message we saw was missing a Received: header…

-Philip


I'll second that and add a similar comment about ".link" URLs inside the
message. Last week I created a uri rule to fire on any ".link" hosted URL
and so far havn't seen a single FP.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: punctuation in subjects

2014-09-01 Thread Dave Funk

On Mon, 1 Sep 2014, Martin Gregorie wrote:


On Mon, 2014-09-01 at 03:17 -0400, Jude DaShiell wrote:

Messages with question marks and spaces have been showing up in my inbox
on another account.  To blacklist these [? ] would take care of those
characters in a Subject: line.  Would such a regular expression
effectively blacklist any message having just those two kinds of
characters in its Subject: line in any combination?


No: a regex along these lines
  /[? ]/
will hit all subject lines containing either a space or a question mark,
i.e. just about every subject line you'll ever see.

This one
  /[? ].*[? ]/
will only hit subjects with both characters in any order, but is
probably also far too general to use by itself. Make it a subrule (name
starts double underscore) and use a metarule to combine it with another
subrule that fires on something that usually only appears in spam and
you may have the basis of something more useful.


Maritin's proposed rule would hit a string that contained at least two
'?' or space characters as well as other characters. (EG: '?junk?' or
'this one hit').

If you want to be sure to hit subjects that contain ONLY question marks
and spaces (and at least one of each) it will take two sub-rules combined
into a metarule.
EG:
 header__SUBJECT_SPACE_QM   Subject =~ /(?:\? | \?)/
 header __SUBJECT_MORE_THAN_SP_QM   Subject =~ /[^? ]/
 meta SUBJECT_SPACE_QM  __SUBJECT_SPACE_QM && ! __SUBJECT_MORE_THAN_SP_QM

(untested)

FWIW, I would expect such a rule to have a limited useful life-span.
Now that it's been discussed here spammers will adapt their garbage to
avoid it (IE add one other kind of character to the subject, etc).

Spammers do monitor this list and just the act of disussing spam
characteristics can cause them to adapt their tactics.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


users@spamassassin.apache.org

2014-08-15 Thread Dave Funk

On Sat, 16 Aug 2014, Rajesh M. wrote:


hi

we are getting spam with a lot of hashes &
Ꭼmа

i checked out KAM.cf but not able to trap such emails

any solution please ?

thanks
rajesh


Search the July archive of this list for postings with the subject of:
 "More text/plain questions"

There were a couple of possible solutions discussed, including new
features added to the latest version (trunk) of spamassassin.
I took one of them (new functions in MIMEEval) back-ported it to my SA
kit and it has been hitting pretty regularly on that kind of spam.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Somewhat OT - how do I whitelist a host which is in a DNSBL in sendmail?

2014-07-24 Thread Dave Funk

On Thu, 24 Jul 2014, Thomas Cameron wrote:


Howdy -

I have two VMs at Digital Ocean, one on the east coast, one on the west.

I'm running Sendmail-8.14.8-2.fc20.x86_64. I have several DNSBLs listed:

FEATURE(`dnsbl',`in.dnsbl.org ')dnl
FEATURE(`dnsbl',`sbl-xbl.spamhaus.org')dnl
FEATURE(`dnsbl',`cbl.abuseat.org')dnl
FEATURE(`dnsbl',`dul.dnsbl.sorbs.net')dnl

Unfortunately, my home network is attached to a cable provider which
shows up in dul.dnsbl.sorbs.net.

Can I whitelist my IP address so that I can send mail through my mail
servers? Right now, it gets rejected.

Yeah, I know, I can always use my ISP's smtp server, I guess. But that
kind of sucks. I would rather use mine. Purely a pride thing, I know.

Thomas


Thomas.
Do you have 'MSA' port enabled for your sendmail? (IE port 567) and
SMTP-AUTH? Then just skip the dnsbl checks for auth'ed mail submissions.
You could whitelist your client IP address in your 'access' file but
what happens when that address changes? (I assume your ISP gives you
a DHCP address).


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Bayes, Manual and Auto Learning Strategies

2014-07-02 Thread Dave Funk

On Wed, 2 Jul 2014, Steve Bergman wrote:

Well... I just turned on autolearn for a moment, deleted the bayes_* files on 
the test account I use, and sent myself a message from my usual outside 
account. And new bayes_* files were created. So I was wrong, and I win. More 
options.


So now I can proceed to the "what does this mean?" phase.

If I leave things as they are, then training is perfect if the users are 
diligent. But if they are not, then... what? I see plenty of spams getting 
through with a 0.0 score. IIRC, the autolearn spam threshold is 7? Pretty 
much everything there is spam.


But I'm not sure I quite buy having the static rules of SA training Bayes. 
Isn't Bayes just learning to emulate the static rules, with all their 
imperfections?


Unless you've explicitly disabled them, the network based rules (razor,
pyzor, dcc, DNS based rules, RBLs, URIBLs, etc) constitute an external
'reputation' system to pass judgment on messages.
It's not uncommon to take a low-scoring spam and find that it gets a
higher score on retest as it has been added to various bad-boy lists.

This is also one way that gray-listing helps. If you stiff-arm the first
pass of a spam run a later check may hit it more accurately as it's been
added to block-lists in the mean-time.


If it starts going wrong, doesn't that mean the errors are going to spiral 
out of control?


That is a possible risk of relying solely on auto-learning.
The autolearn system has been carefully crafted and tuned over the years
to try to prevent a feed-back loop from throwing it into a tail-spin.
For example the internal scoring system used to determine if a message
is spam or ham WRT the choice for auto-learning explicitly excludes
the Bayes score (and other particular kinds of scores such as white/black
lists) to try to prevent tail-eating.
Occasional judicious manual learning can help to 'tweak' things when Bayes
looks like it's not in top shape. (IE manual learning of FPs & FNs).

I've used site-wide Bayes with auto-learning at a site with ~3000 users
and have had to flush & restart our Bayes database twice in 10 years.

Dave

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Bayes, Manual and Auto Learning Strategies

2014-07-02 Thread Dave Funk

On Wed, 2 Jul 2014, Steve Bergman wrote:


On 07/01/2014 11:49 PM, Karsten Bräckelmann wrote:


Those do not tell you about using file or SQL based databases?


They do. But not specifically with respect to autolearn.

You never

thought about googling for "spamassassin per user" and friends? You
never checked the SA wiki?


I have, indeed. No reference to autolearn and persistent storage. The lack of 
mention is notable.


I'd expect people to be lining up to tell me I'm mistaken if I absolutely 
were.


Can you point me to a change log somewhere documenting autolearn moving from 
in-memory and system-wide to per user and persistent?


I don't hold a strong opinion on this. It would be nice if I were wrong. It 
would open more options.


I'm just waiting for evidence that it's the case. My perception is that It's 
not.


-Steve


Steve,
For some reason you seem to be hung-up on Bayes "autolearning". It it
possible that you're confusing it with "Auto-White listing"? (which is now
deprecated and has -nothing- to do with Bayes).

SA's Bayesian scorer is a system based upon a method that parses a
message, extracts 'tokens' from it and uses an algorithm to calculate a
score for the message based upon a dictionary of previously seen tokens
and their relative merit.

The dictionary is created and updated by a process called 'learning'
wherein already-classified messages are tokenized and their tokens are
stored in the dictionary along with a merit value derived from their
instance count and a factor taken from being classified as spam or ham.
This learning process can be either externally driven (known as 'manual'
learning) or via an automated process from within SA as it scores messages
(known as 'auto' learning). So regardless of whether manual or auto
learning is utilized, tokens are added to the dictionary. It's also
possible to employ both auto & manual learning methods in the same
installation.

There can be one dictionary used for scoring all messages processed (called
"site wide Bayes") or many separate dictionaries, one used for each
recognized user ("per user Bayes"). Either way, the dictionary(s) need to
be updated (and the update process could be either manual, auto, or both).

The Bayes dictionary(s) need to be stored some how, the usual method is
via some kind of database. It could be a simple file based DB, some kind
of fancy SQL server based system or something else. This is a DBA'ish kind
of choice as to what particular technology is used to store the
dictionary DB. (usually on disk in some way, could be in some kind of
memory resident set of tables, or something else???).

So you have a multi-dimensional matrix WRT your Bayes system
configuration, and manual VS auto learning is just one factor.

It's been this way for the past 10+ years AFAIK (well, maybe 10 years
ago it didn't have as many options for back-end database storage, mostly
limited to Berkeley-DB type methods).

I hope this helps you.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: SA rule to detect prior SA pass?

2014-06-28 Thread Dave Funk

On Sat, 28 Jun 2014, RW wrote:


On Fri, 27 Jun 2014 20:43:19 -0500 (CDT)
David B Funk wrote:


Looking at my mail streams I see evidence that spammers sometimes
add faked "SpamAssassin" headers to their messages (I assume to try
to trick recipients into thinking that the message has already been
given a clean bill-of-health).

I wrote a few test rules to look for these pre-existing "X-Spam-"
headers to test to see if it could be used as a spam detector.
However I got no hits on these rules even on hand crafted test
messages that contained such stuff.

Checking the SA source I found in PerMsgStatus.pm a line of code:
   $self->{msg}->delete_header('X-Spam-.*');
that ran before any tests. So looking for SA headers inside of SA
is pointless.

So does anybody have any ideas how to test for evidence of a
prior SA pass?


You could simply rewrite "X-Spam-" to "X-Original-Spam-".


That's what I was afraid of. As I'm using a "milter" as my glue (so I
can SMTP reject high scoring spam) the usual MTA rewrite functions don't
do any good, so I'll have to hack the milter. I was hoping for something
more portable.


I doubt this is going to be very useful because too much legitimate
mail has X-Spam- headers. Most of the mailing lists I read have them.
Some servers add them to outgoing mail. You may have users that receive
scanned mail forwarded from ESPs etc.


I'm aware that by itself the presence of those headers aren't definitive
spam signs but I was hoping to combine that info with other clues to
create meta rules. However cannot test out this hypothesis with out the
ability to detect those headers.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: FYI - ahbl.org and BIND DNS errors

2014-06-10 Thread Dave Funk

On Tue, 10 Jun 2014, Andrew Daviel wrote:

Per http://ahbl.org/content/changes-ahbl, AHBL is going away (still used in 
spamassassin-3.3.1)


Meanwhile, AHBL is serving strange DNS responses, e.g.
(from wireshark)

 1   0.00 142.90.100.186 -> 162.243.209.249 DNS 93 Standard query 0xc828 
A zuz.rhsbl.ahbl.org
 2   0.072481 162.243.209.249 -> 142.90.100.186 DNS 246 Standard query 
response 0xc828

   Authoritative nameservers
   rhsbl.ahbl.org: type NS, class IN, ns invalid.ahbl.org
   rhsbl.ahbl.org: type NS, class IN, ns unresponsive.ahbl.org
   rhsbl.ahbl.org: type NS, class IN, ns unresponsive2.ahbl.org
   Name Server: unresponsive2.ahbl.org
   Additional records
   invalid.ahbl.org: type A, class IN, addr 244.254.254.254
   Addr: 244.254.254.254 (244.254.254.254)
   unresponsive.ahbl.org: type A, class IN, addr 10.230.230.230
   Addr: 10.230.230.230 (10.230.230.230)
   unresponsive2.ahbl.org: type A, class IN, addr 192.168.230.230
   Addr: 192.168.230.230 (192.168.230.230)
   invalid.ahbl.org: type , class IN, addr fe80::
   Addr: fe80::

This last one, fe80::, is an IPv6 scope-link address that causes the BIND 
nameserver to log a weird error

named[31365]: socket.c:4373: unexpected error:
named[31365]: 22/Invalid argument
Per http://www.mail-archive.com/bind-users@lists.isc.org/msg05240.html
connect() fails as it is missing scoping information.


Umm, with a name like "invalid.ahbl.org" what do you expect? That's 
truth in advertising. It's 'invalid', as a matter of fact all of those

addresses aren't usable, they're either RFC-1918 or multicast/local-scope.
So none of those are valid for remote queries.

Do NOT use rhsbl.ahbl.org. period. end of song.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: some questions on sa-compile

2014-05-03 Thread Dave Funk

On Sat, 3 May 2014, RW wrote:


On Fri, 02 May 2014 21:51:02 +0200
Axb wrote:



2) The non-amenable rules are processed, but may be slower than if
they weren't compiled?


yep


It means they get processed as normal in perl, so they don't get
speeded-up, but they aren't slowed-down.



One thing, rules which cannot be compiled are often rules (such as ones that
use negative look-ahead/look-behind ) that are potential major CPU hogs.
If used in limited scope they aren't usually a problem but if not carefully
written can be CPU sucks. Not to say that they shouldn't be used at all but
just with care.
So if you see that warning about uncompileable rules, take a second look
at those specific rules.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Missing header when skipping mail

2014-04-18 Thread Dave Funk

On Fri, 18 Apr 2014, Kevin A. McGrail wrote:


On 4/18/2014 6:18 AM, Erik Logtenberg wrote:

The tool that hands the message to spamasassin (spampd in your case)
imposes the size limit. The message is never seen by spamassassin.
You're barking up the wrong tree ;)

Tom


Ah, I agree. This is where that happens:


$self->log(2, "skipped large message (". $size / 1024 ."KB)");

I'll send my question to Maxim Paperno, author of spampd, instead.



Spamassassin is a program AND an API.

Right now you are using the API so your wrapper (spampd) can do anything it 
wants but we could also consider this feature for spamc/spamd and for 
spamassassin as a parameter to enable.  Please open a bugzilla bug if you 
would like it considered.


Another way to deal with this problem is, via the glue agent, to truncate
large messages and send just the first X-kbytes of the message (for some
appropriate value of X, I'm using 256K). The idea is that a large spam message
is probably large because of some attachment and the payload is in the first
part.
I'm using a milter to connect to spamd so I can do SMTP rejections of high
scoring spam. I coded the truncation feature in the milter so no need to
modify the MTA nor spamd.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: meta test HEXHASH_WORD has undefined dependency '__KAM_BODY_LENGTH_LT_512'

2014-04-06 Thread Dave Funk

On Sun, 6 Apr 2014, Helmut Schneider wrote:


Hi,

over the last weeks I constantly run into issues when I cannot get SA
up again because of "broken" rule sets. Today it's

Apr  6 17:06:01.960 [31092] dbg: rules: meta test HEXHASH_WORD has
undefined dependency '__KAM_BODY_LENGTH_LT_512'

Is something wrong in my process or do we have a problem with QA these
days.

Don't get me wrong, I appreciate your work very much.

Thanks, Helmut


What, exactly, do you mean by 'I cannot get SA up again because of "broken" rule
  sets. Today it's'

That is effectively a warning, not a fatal error message. That one particular
kind of warning should not stop SA from running.

It means that you've got a meta rule that is a combination of other rules and
one of the other rules is missing.
For example:

 meta DIGEST_MULTIPLE   RAZOR2_CHECK + DCC_CHECK + PYZOR_CHECK > 1

if you don't have PYZOR installed you'll get a warning for that particular rule
but it just be lacking the input from that particular potential component of
the rule, it will work just fine with the other two components.

Worst case, a given meta rule won't fire at all because it's missing some
necessary component and thus that rule will be effectively disabled but the
whole SA engine should still run.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Remove spam results from mail header

2014-03-16 Thread Dave Funk

On Sun, 16 Mar 2014, Re@lබණ්ඩා™ wrote:


Hi All,

Is there a way to disable spam results from been published to the mail header?


Usually, depending upon how spamassassin is hooked into your mail system.
Looking at your example message I don't see any "Checker-Version" header which
normally SA systems add. This tends to indicate that your system is doing some
kind of custom header/results processing. Normal SA systems use configuration
options (see BASIC MESSAGE TAGGING OPTIONS section of SA documentation) to
control this however it appears that you're using SA-Exim, you'll need to check
its documentation for how to configure that.


And I could see two sections of spam results in the mail header as follows. 
What could be the reason for that?


At a guess, it appears that this message has gone through some kind of list
processing system, so maybe SA processed twice. Usual SA practice is to remove
(or overwrite) previously existing SA headers when processing a message.
However this, again, is dependent upon how SA is hooked into your mail system.

Bottom line, you've got an unusual SA kit there, time to do some code diving.


From: xxx
To: xxx
Thread-Topic: 
Thread-Index: Ac8/0magIx/1eKsLQWmx3TflU2Na0Q==
Date: Fri, 14 Mar 2014 22:25:53 +
Message-ID: 
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [x.x.x.x]
MIME-Version: 1.0
X-Spam_score: -1.9
X-Spam_score_int: -18
X-Spam_bar: -
X-Spam_report: Spam detection software,
    running on the system "xx.com", has
    identified this incoming email as possible spam. The original message
    has been attached to this so you can view it (if it isn't spam) or
    label similar future email.  If you have any questions, see
    @@CONTACT_ADDRESS@@ for details.
    Content preview: xx [...]
    Content analysis details:   (-1.9 points, 4.8 required)
    pts rule name  description
     --
    --
    -1.9 BAYES_00   BODY: Bayes spam probability is 0 to 1%
    [score: 0.]
    0.0 HTML_MESSAGE   BODY: HTML included in message
Subject: x
X-BeenThere: 
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: 
List-Unsubscribe: 
List-Archive: 
List-Post: <mailto:x>
List-Help: <mailto:xx>,
    <mailto:x>
Content-Type: multipart/mixed; boundary="===153982665457238=="
Sender: 
Errors-To: 
X-Spam_score: 1.6
X-Spam_score_int: 16
X-Spam_bar: +
X-Spam_report: Spam detection software, running on the system "xx.com", has
 identified this incoming email as possible spam.  The original message
 has been attached to this so you can view it (if it isn't spam) or label
 similar future email.  If you have any questions, see
 @@CONTACT_ADDRESS@@ for details.

 Content preview:   [...]

 Content analysis details:   (1.6 points, 4.8 required)

  pts rule name  description
  -- --
 -0.0 SPF_PASS   SPF: sender matches SPF record
  0.0 HTML_MESSAGE   BODY: HTML included in message
  0.8 BAYES_50   BODY: Bayes spam probability is 40 to 60%
 [score: 0.4901]
  0.8 RDNS_NONE  Delivered to internal network by a host with no 
rDNS
X-SA-Exim-Connect-IP: x.x.x.x
X-SA-Exim-Mail-From: xxx
X-SA-Exim-Scanned: No (on ); SAEximRunCond expanded to false
--
Re@lBanda


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: tons of forged bills in german

2014-01-18 Thread Dave Funk

On Sat, 18 Jan 2014, Michael Monnerie wrote:

Dear list, since this week there are tons of very good forged bills that look 
like real, from big companies like telekom, vodafone, etc. They look like the 
original, and just the link in the middle, where it says "download your bill 
here", goes to a site containing trojans.



[snip..]
domain. Also, as Vodafone uses SPF, I'd like to check if I hit VODAFONEgood 
&& !SPF signature in the mail.


The problem with all this is, that there are MANY companies, so does someone 
have a better idea?


For companies who use SPF or DKIM, create a whitelist_auth entry for them
then either black list them or create rules to hit on any sign of the
comnpany's messages. The whitelist_auth will override any rules so real
messages will get thru and the blacklist/targeted rules will hit the
imposterers.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: dependency hell (completely off-topic...)

2013-11-15 Thread Dave Funk

On Fri, 15 Nov 2013, David F. Skoll wrote:


On Fri, 15 Nov 2013 16:25:30 +
RW  wrote:


Why not just email yourself the package files?


Or write an IP-over-email network driver that tunnels
to an exterior friendly machine...

(/me ducks...)

Regards,
David.


That would earn him a visit by the MiB who snoop all incoming & outgoing
emails (would perplex the c**p outta them, they'd assue he was
up to something ;).


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Explanation of message of RDNS_NONE??

2013-10-22 Thread Dave Funk

On Tue, 22 Oct 2013, Kai Schaetzl wrote:


Webmaster DKDB wrote on Tue, 22 Oct 2013 08:08:01 +0200:


dkdb.dk.37.66.77.in-addr.arpa


Probably because of this. This reverse DNS is not under an existing top-
level-domain and looks very much like a normal reverse lookup (and not the
result). Have them set it to a real public hostname.

Kai


Kai,
.in-addr.arpa. -is- the official top-level dns zone for reverse map data.

Webmaster,
That's because the reverse-map entry for 119 in the 37.66.77.in-addr-arpa
zone file is missing a period at its end. That's a DNS admin error.

send email to hostmas...@ngdc.net and ask them to fix that.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: How do I find a parent rule for a test?

2013-09-16 Thread Dave Funk
 score and affect
the
overall message score. So, at the most basic level, any rule having a
name
that starts with two underscores is _inherently_ a base for other
rules.

In order to determine *which* rules it's a base for, you have to look
for
that rule name in the config files. This isn't too easy to do online,
you
pretty much have to grep the rules files in a local install.



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  WSJ on the Financial Stimulus package: "...today there are 700,000
  fewer jobs than [the administration] predicted we would have if we
  had done nothing at all."

---
 Tomorrow: the 226th anniversary of the signing of the U.S. Constitution







--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Rules not working

2013-09-08 Thread Dave Funk

On Sun, 8 Sep 2013, Raymond Jette wrote:


When I add add custom rules to /etc/mail/spamassassin/local.cf the rules work 
as expected.  If I create any *.cf file and put the rules in they do not work.  
My test rule is:

body test_match_all /.*/
scoretest_match_all -0.01

Rules only work if they are in local.cf.  If I run the following command:

echo | spamassassin --debug

I can see my custom rules that are in files other than local.cf get called.  
Why would they work this way but never get called when spamd is called from 
exim?

Thanks for any help you can provide,
Ray


File system permissions issues? Are the new rules files readable by the 
"exim" user?



--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Catching fake LinkedIn invites

2013-08-28 Thread Dave Funk

On Thu, 29 Aug 2013, Michael Schaap wrote:


On 29-Aug-2013 00:30, John Hardin wrote:

On Wed, 28 Aug 2013, Michael Schaap wrote:


Hi,

I'm getting loads of fake LinkedIn invites, most of which aren't caught by 
SpamAssassin.
Does anyone have a good SpamAssassin rule to catch those, while letting 
real LinkedIn invites through?

Do they fail SPF or DKIM?

Unfortunately not, for the most part. (The "From:" header is at linkedin dot 
com, but the envelope sender is a random address, and I guess SPF and DKIM 
run on the envelope sender only.)


If they do, and the legit ones pass SPF or DKIM, then the standard solution 
is to add a header rule to detect that the message claims to be from that 
domain (e.g. using the domain part of the From or Reply-To headers), and 
then either give that rule some points and also define whitelist_from_auth 
for the domain, or meta that rule with (SPF_FAIL || DKIM_FAIL) and give the 
meta a some points.


There were some examples of doing this for facebook within the last couple 
of weeks, check the list archives.



Hmm, legit ones have SPF_PASS.
So I guess I could set up a rule that punishes messages “From:” linkedin 
which don't have SPF_PASS. I might give that a try, once I find some time to 
figure out how...


Untested but try:

whitelist_auth *@bounce.linkedin.com
whitelist_auth *@linkedin.com
blacklist_from *@linkedin.com

The whitelist_auth will kick in on any message from @linkedin.com which 
passes SPF or DKIM thus will null out the bad points from the

blacklist_from, and end up being neutral.
Any purported linkedin.com message not getting the whitelist_auth boost 
will be clobbered by the blacklist_from.

One caveat, a transient DNS failure might cause the SPF/DKIM to not verify
thus not boosting legit linkedin messages.

There is a low-power version of whitelist_auth called def_whitelist_auth 
which only boosts by +15 (I use it for a lot of stuff). However there

isn't a def_blacklist_from so you have to use the "full strength" versions
of both white/black list (+100/-100) to make them balance out each other.

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: Errors when processing mail.

2013-07-14 Thread Dave Funk

On Sun, 14 Jul 2013, Christian Dysthe wrote:


Hi,

I am very new to Spamassassin and trying to have it work with the
Citadel mail server which has support for spamassassin. I'm running
Spamassassin 3.3.2-2ubuntu1 on Ubuntu Server Edtion 12.04 LTS x64. The
entries I get in the mail.log when mail is being delivered are:

Jul 14 16:52:21 concerto spamd[7687]: plugin: eval failed: bayes: (in
learn) locker: safe_lock: cannot create tmp lockfile
/nonexistent/.spamassassin/bayes.lock.concerto..com.7687 for
/nonexistent/.spamassassin/bayes.lock: No such file or directory
Jul 14 16:52:21 concerto spamd[7687]: spamd: clean message (-0.7/5.0)
for (unknown):65534 in 0.2 seconds, 1496 bytes.
Jul 14 16:52:21 concerto spamd[7687]: spamd: result: . 0 -
FREEMAIL_FROM,MSGID_FROM_MTA_HEADER,RCVD_IN_DNSWL_LOW,SPF_PASS,T_DKIM_INVALID
scantime=
0.2,size=1496,user=(unknown),uid=65534,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=43706,mid=,autolearn=unavailable
Jul 14 16:52:21 concerto spamd[7686]: prefork: child states: II

Could someone point me in the right direction so I can solve this
issue, or these issues even?


The error indicates that the Bayes component of your spamassassin
cannot create the lock file 
"/nonexistent/.spamassassin/bayes.lock.concerto..com.7687"

First order of business, what do you get when you do a "ls -ld 
/nonexistent/.spamassassin"
Does that directory exist? What are its ownership and permissions?
Is it writable by the UID that your spamd is running under?

Bottom line, the spamassassin Bayes module needs a writable working
directory. Your error messages imply that the directory that your
spamassassin configuration is telling your Bayes to use (that 
"/nonexistent/.spamassassin" thing) has issues.


So either you need to fix that directory or fix your configuration to
tell it where the directory it -should- be using is.

I don't know that "Citadel" kit, you may be better off finding some
discussion list which is specifically about it. Just guessing by
that directory name ("/nonexistent/") it's something that you need
to explicitly create and change your configuration to point to.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: False negatives/positives on debian

2013-06-22 Thread Dave Funk

On Sat, 22 Jun 2013, Robert S wrote:


I've eliminated this problem by using openDNS servers:

# cat /etc/resolv.conf
domain mydomain.net.au
search mydomain.net.au
nameserver  192.168.0.33   #<--- My server IP
nameserver  208.67.220.220
nameserver  208.67.222.222

Is this likely to have untoward consequences?  I've also looked at using 
unbound - which looks quite straightforward.


Assuming that your dnsmasq (or other DNS-server) is running on the same
machine as your SA, use the loopback IP addr (127.0.0.1) instead of the
explicit IP addr of your server's ethernet interface.

IE, in your resolv.conf use:

  domain mydomain.net.au
  search mydomain.net.au
  nameserver  127.0.0.1
  nameserver  208...stuff
  nameserver  some.other.server..

This is for several reasons:
1) ease of maintenance, always works, even after changing your
   server's IP addr for what ever reason.
2) security, you can then change your DNS server to only listen
   for queries on the loopback addr and make it more immune to
   remote attacks.
3) performance, DNS queries work best if they fit in a single
   UDP packet. The loopback has a larger MTU than standard
   enet interfaces, so more likely to handle large DNS queries
   w/o fragmentation or TCP fallback.

Now if you're also using that DNS server to provide DNS service
for other client machine on your local LAN then you cannot do
the change in (2) (make DNS server only listen to loopback) but
it still simplifies configuration. (allow all queries on lo0 and
selected queries on eth*).


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: False negatives/positives on debian

2013-06-21 Thread Dave Funk

On Sat, 22 Jun 2013, Robert S wrote:


I am running spamassassin_3.3.2-5 on debian Wheezy on a small business server 
(x86).  I am getting numerous complaints about mail
being falely categorised as spam/ham.  I also use version 3.3.2 on my home 
server using gentoo (amd64) and don't have these
problems.  I have removed all customisations and have reinstalled spamassassin 
on my debian machine.  There still seem to be problems
- here's an example using the provided sample files.  Can anybody help?

This message seems to get blocked in a lot of blocklists (which also seem to 
happen to my users' messages).

Options for SA are:

# ps ax |grep spam
22408 ?    Ss 0:02 /usr/sbin/spamd --create-prefs --max-children 5 
--helper-home-dir -d --pidfile=/var/run/spamd.pid

/etc/procmailrc includes this:

* < 256000
| /usr/bin/spamc
$ spamc < sample-nonspam.txt

Received: from localhost by debian.myserver.net.au
 with SpamAssassin (version 3.3.2);
 Sat, 22 Jun 2013 12:06:12 +1000
From: Keith Dawson 
To: t...@world.std.com
Subject: TBTF ping for 2001-04-20: Reviving
Date: Fri, 20 Apr 2001 16:59:58 -0400
Message-Id: 
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on
 debian.myserver.net.au
X-Spam-Flag: YES
X-Spam-Level: 
X-Spam-Status: Yes, score=8.5 required=5.0 tests=RP_MATCHES_RCVD,SAGREY,
 URIBL_AB_SURBL,URIBL_BLOCKED,URIBL_GREY,URIBL_MW_SURBL,URIBL_PH_SURBL,
 URIBL_RED,URIBL_WS_SURBL autolearn=no version=3.3.2
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="--=_51C50694.B9FC2455"
This is a multi-part message in MIME format.
=_51C50694.B9FC2455
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
Spam detection software, running on the system "debian.myserver.net.au", has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
the administrator of that system for details.
Content preview:  -BEGIN PGP SIGNED MESSAGE- TBTF ping for 2001-04-20:
   Reviving T a s t y B i t s f r o m t h e T e c h n o l o g y F r o n t [...]

Content analysis details:   (8.5 points, 5.0 required)
 pts rule name  description
 -- --
-1.5 RP_MATCHES_RCVD    Envelope sender domain matches handover relay domain
 0.0 URIBL_RED  Contains an URL listed in the URIBL redlist
    [URIs: tbtf.com]
 0.0 URIBL_BLOCKED  ADMINISTRATOR NOTICE: The query to URIBL was 
blocked.
    See
    
http://wiki.apache.org/spamassassin/DnsBlocklists#dnsbl-block
 for more information.
    [URIs: tbtf.com]
 1.1 URIBL_GREY Contains an URL listed in the URIBL greylist
    [URIs: tbtf.com]
 0.0 URIBL_PH_SURBL Contains an URL listed in the PH SURBL blocklist
    [URIs: tbtf.com]
 4.5 URIBL_AB_SURBL Contains an URL listed in the AB SURBL blocklist
    [URIs: tbtf.com]
 1.7 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist
    [URIs: tbtf.com]
 1.7 URIBL_MW_SURBL Contains a Malware Domain or IP listed in the MW 
SURBL
 blocklist
    [URIs: tbtf.com]
 1.0 SAGREY Adds 1.0 to spam from first-time senders

=_51C50694.B9FC2455
Content-Type: message/rfc822; x-spam-type=original
Content-Description: original message before SpamAssassin

[snip..]

clearly the bulk of those points come from those URI-RBL type rules,
which look like FPs. At least that "tbtf.com" domain isn't listed right
now, it -might- have been when this message was processed. However given
that "URIBL_BLOCKED" rule fired, it looks more like there's something
wrong with your setup which is causing all those URI-RBLs to FP.

Have you looked at the web page that URIBL_BLOCKED rule references?
Have you investigated why it fired? Have you tried taking any
of the advice on that page as to how to deal with this problem?

To go beyond the advice on that page we'd need to know more details about
how your DNS/network is configured on your SA scanner machine (are you
running a local caching DNS server? Are you using some explicit DNS
forwarder? Does your ISP do anything special with DNS queries? ...


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: MariaDB instead of MySQL

2013-05-17 Thread Dave Funk

On Fri, 17 May 2013, David F. Skoll wrote:


On Fri, 17 May 2013 08:58:53 -0700
Quanah Gibson-Mount  wrote:


Personally I wish SA supported LMDB.


We've had very good luck with CDB, Dan Bernstein's "constant database"
format.  Reads are unbelievably fast.

The only downside to CDB is that you cannot update a CDB file.  You need
to generate a new one from scratch.  Still, even that is quick enough that
we use it.

One of my colleagues benchmarked CDB versus Berkeley DB and the difference
was dramatic: http://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/
CDB was about 6-7 times as fast on random reads as Berkeley DB.


If CDB is read-only, how do you store the a-time values on lookups so you
know which tokens aren't being used to facilitate expiry?


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: .pw / Palau URL domains in spam

2013-05-05 Thread Dave Funk

Donesh,

Thanks for your prompt response.
Do you just want the domain names or do you also want copies of the spam?

Dave

On Sun, 5 May 2013, doneshlaher wrote:


Hello Dave Funk,

Thank you for providing us with the list of domain names. We are acting on
them and will be taken down within 24/48 hours.

We request you to report the domain names at abuse.al...@registry.pw and
also cc the same mail to abuse.al...@directi.com.

Regards

Donesh Laher
Cyber Security Analyst
.PW Registry


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: .pw / Palau URL domains in spam

2013-05-04 Thread Dave Funk

On Wed, 1 May 2013, doneshlaher wrote:


Hello Axb,

Thank you for providing with the domain names. We will be suspending all
these reported domain names.

However, in the mean time may i know what kind of spams have been received
?? also can you please forward us the email headers of few of the reported
domain names.

This would help us to analyse the headers and understand, whether we the
account is compromised or not.

Regards

Donesh Laher
Cyber Security Analyst
.PW Registry


Donesh,
How many dozen spams a day would you like to receive?
Should I send them to your personal address or is there some
other reporting address I should use?

We are not a large site (only a few thousand users) but in the past few
weeks have been receiving hundreds of spams a day advertising ".pw" domains.
Here's a partial list of some of the past 3 days worth:
(this list would be much larger except that I've been black-listing the
IP addresses of their hosting providers as fast as I can identify them)

vision-virtuahosting1.pw
visionsvirtualwebhost4.pw
allsupremedeal.pw
alltopdeals.pw
amerivalues.pw
autopricefind.pw
autopricefinder.pw
banesgroup.pw
dallyhost.pw
dimehosts.pw
dursidis.pw
efulan.pw
efundess.pw
ekmsgroup.pw
ezhotdealz.pw
getgreatwins.pw
gethotdealz.pw
grevaluaqu.pw
igreatness.pw
imaginec1.pw
iradjead.pw
islity.pw
metagreatwins.pw
neathotdealz.pw
newgreatdealz.pw
progreatdealz.pw
servermaximum.pw
sharpgreatdealz.pw
sleekgreatdealz.pw
specialzhome.pw
specialzland.pw
specialztoday.pw
successtopdeals.pw
superbtopdeals.pw
supertopdeals.pw
usdirects1.pw
vision-virtualhosting12.pw
vision-virtualhosting14.pw
visionsvirtualwebhost2.pw
zbidnow.pw
avanheertyu.pw
getsuperiordeal.pw
sleeplessdaysnow.pw
gwampuer.pw
treelendnews.pw
getmatchednows.pw

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: rule problem basing on X-Spam-ASN - not a rule problem

2013-04-25 Thread Dave Funk

On Thu, 25 Apr 2013, Frank Gadegast wrote:


And SA is doing it right, to remove all X-Spam-lines
before its starting, so that spammer cannot trick SA.

And whatever line is inserted by ASN.pm, it needs
to be stripped too, and thats why its programmed
like it is.

But I have no still no idea how to get it done in a
perfect order, like the following

- SA strips the X-Spam-lines
- ASN.pm inserts its line including the AS
- SA runs its rules and triggeres also on the X-Spam-ASN-line

Is it time to ask the developers or file a bug ?


This doesn't help unless the plugin adds a pseud-header but in the
case of plugins that do, you can change the priority of your
rules to get them to run after the plugin.
I ran into this issue when writing rules to check the results of the
ClamAV plugin.

EG, in my "clamav.cf" file I invoke the plugin with an "eval" then have
rules to trigger off the pseudo-header that it adds. In the rules I have
lines like:

 #loadplugin ClamAV /etc/mail/spamassassin/plugins/clamav.pm  now done in 
v310.pre
 #
 full L_CLAMAV   eval:check_clamav()
 describe L_CLAMAV   Clam AntiVirus detected a virus
 score L_CLAMAV  3
 #
 header T__MY_CLAMAV X-Spam-Virus =~ /Yes/i
 header T__MY_CLAMAV_SANE X-Spam-Virus =~ /Yes.{1,50}Sanesecurity/i
 header T__MY_CLAMAV_MSRBL X-Spam-Virus =~ /Yes.{1,50}(?:MSRBL|MBL)/
 header T__MY_CLAMAV_PHISH X-Spam-Virus =~ /Yes.{1,50}Phish/
 header L_UI_PHISHs X-Spam-Virus =~ /Yes.{1,50}Phishing/

 # Need to set the 'X-Spam-Virus' header rules to a "high" priority
 # so they run late and will be evaluated -after- the plugin runs
 priority T__MY_CLAMAV   
 priority T__MY_CLAMAV_SANE  
 priority T__MY_CLAMAV_MSRBL 
 priority T__MY_CLAMAV_PHISH 
 priority L_UI_PHISHs   
 #
 meta MY_CLAMAV_SANE (L_CLAMAV && T__MY_CLAMAV_SANE)
 meta MY_CLAMAV_MSRBL    (L_CLAMAV && T__MY_CLAMAV_MSRBL)
[snip..]



--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: re-learning ? was - bayes - large message

2013-04-21 Thread Dave Funk

On Sun, 21 Apr 2013, Joe Acquisto-j4 wrote:



--
  John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
  jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org


Thanks.  This has cleared most of my fog.

I had chosen to forward as it seemed simpler at the time, given the SA
learning curve.  Still on the uphill part.

Just setup shared folders into which I can drag and drop spam and miss-caught 
spam,
unaltered.

And, I can access that folder using an imap client, from the SA box.  Its 
"alpine", which
came with the Distro I am using, opensuse 12.2.

Is there a linux imap client that can be scripted (bash preferred)?  Relatively 
easily?

Or perhaps someone knows of something already crafted for this purpose, that 
needs only
minor tweaking?

joe a.


Included in the UWash IMAP kit is a program called "mailutil" which may be
available ready-built for your OS distro (EG: 
http://linux.die.net/man/1/mailutil).

One of the things that mailutil can do is to transfer mailboxes (mail "folders")
from one mail server to another (EG from a traditional mbox into an IMAP 
server or from one IMAP server to another).


Use it to copy your IMAP spam/ham folders to local (on your SA server)
'mbox' format folders and then learn from them.

Dave

--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: spamass-milter rejecting messages because no score found in large emails

2013-03-23 Thread Dave Funk

On Sat, 23 Mar 2013, Matus UHLAR - fantomas wrote:


Am 22.03.2013 22:31, schrieb Benny Pedersen:

are spamass-milter using spamc ?


On 23.03.13 00:34, Robert Schetterer wrote:

at my knowledge
spamass-milter uses spamd, the deamon vers of spamc


no, no, spamd is the daemon and spamc is an utility that talks to spam
daemon :-)

and yes, spamass-milter uses spamc. you can pass extra flags to it, e.g.
-s to send all mail up to given size to spamd (default:500KB)


It is true that spamass-milter uses the spamc utility but not all
spamassassin connecting milters do. I'm using a customized version of
miltrassassin which speaks the 'SPAMC' network protocol directly to spamd,
no use of the "spamc" client program at all.

There are some milters that don't even use spamd, they directly 
instantiate the spamassassin engine within themselves.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Hot News

2013-03-15 Thread Dave Funk

On Fri, 15 Mar 2013, Kevin A. McGrail wrote:


On 3/15/2013 9:17 AM, Tom Kinghorn wrote:
  On 15/03/2013 15:11, Christopher Nido wrote:


http://www.naturalstonesinc-munged.com/aah/pabfjd/pgrezs


Now this is a guy with "cahona's grande' " for spamming the spamassassin list.

Poor sucker.


It's a compromised Yahoo! account.  One of the #1 spamming issues right now for 
us.

Regards,
KAM


Not only a compromised Yahoo! account but also a compromised website so
listing the URLs in some kind of RBL will be probelmatic for FPs.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

Re: Spamassassin not parsing email messages

2012-12-28 Thread Dave Funk

That implies that what ever mechanism you're using in the original process
is adding a blank line (or bare 'nl' or 'cr') to the beginning of the
message that you're then handing to SA.

Idiot question, are you doing (or not) a "chomp" in the initial read 
process?



On Fri, 28 Dec 2012, Sean Tout wrote:


Hi Henrik & Jeff,

One more input that might shed more light. I copied one of the emails from
the above 3 emails into its own file and ran spamassassin from the command
line in test mode against it and it worked fine. the command is
spamassassin --test-mode < /spamemails/singleemail.spam

where singleemail.spam contains a single spam email.

Regards,

-Sean.




--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-not-parsing-email-messages-tp102770p102782.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Scoring Yahoo mail from certain continents/countries ?

2012-12-09 Thread Dave Funk

On Sun, 9 Dec 2012, Frederic De Mees wrote:


Dear list,

Here is the context.
The French-speaking countries receive tons of e-mails, mostly fraud attempts, 
fake lotteries, originating from West-Africa and sent by Yahoomail users.
Often those messages contain big attachments. The payload (text of the 
message) is embedded in a 1MB jpeg with fake certificates of a lawyer, a 
logo, or whatever.


Spamassassin misses 100% of them because:
- the sender IP (Yahoo) is genuine and has a good reputation
- the analysis of the message text shows nothing bad, as the mill!ions of 
euros are in the picture attachment

- due to the message size, the analysis is skipped anyway.

If no customer of the mail server in question expect any mail from any Yahoo 
user in Africa, a simple 'header_checks' Postfix directive like this will 
match such messages if their sender IP starts with 41.

/^Received: from .41\..*web.*mail.*yahoo\.com via HTTP/i

I admit this is rough albeit effective. On one side, not all Africa is 41. On 
the other side, I do not want to block all 41.


I would have loved to do it with SA.
This means that the line
"Received: from [ip.add.res.ss].*web.*mail.*yahoo\.com via HTTP" should be 
detected and analysed.

The ip address should be extracted.
The whois of the address should be queried.
The country code of the IP address would return certain number of SA points 
from a list of "Yahoousers bad countries" I would manage.


Because of its size, your message didn't get processed by SA at all.
Try a test run with the max-size parameter bumped up high enough that
SA will take a crack at it. You might find that SA is already able to deal
with that garbage.

If that works then you just need to figure out how to deal with bloated
image spams. Recently there have already been a couple different threads
on this list about exactly that issue (ranging from just increase the
max-size for everything, to make special connector that truncates bloated
spams).

Until you get SA to actually process these messages, there's no point to
discussing added bells-and-whisles.


--
Dave Funk  University of Iowa
College of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


  1   2   >