Re: Really hard-to-filter spam

2023-08-02 Thread Thomas Cameron via users

On 8/2/23 15:52, David B Funk wrote:


Regardless, if a message has never been seen before and has little 
correlation to earlier messages its Bayes should hit someplace in the 
40% to 60% range.


The fact that it hit 00% indicates a strong correlation to lots of ham 
(or something is screwy with your Bayes).


OK, here's what I got just now:

[thomas.cameron@mail-east ~]$ sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  41449  0  non-token data: nspam
0.000  0  49720  0  non-token data: nham
0.000  0 162741  0  non-token data: ntokens
0.000  0 1689089541  0  non-token data: oldest atime
0.000  0 1691009577  0  non-token data: newest atime
0.000  0 1691007146  0  non-token data: last journal 
sync atime

0.000  0 1690991018  0  non-token data: last expiry atime
0.000  01382400  0  non-token data: last expire 
atime delta
0.000  0  13879  0  non-token data: last expire 
reduction count


I can absolutely re-train Bayes. I am kind of an email pack-rat, so I 
have over a gig of saved known good emails in various folders. I have SA 
set up so that emails are scanned individually on a per user basis via 
procmail rule:


[thomas.cameron@mail-east ~]$ head .procmailrc
MAILDIR=$HOME/mail
LOGFILE=$MAILDIR/procmail.log

:0fw: spamassassin.lock
* < 512000
| spamassassin

I have the users move spam to an imap folder, and then run (via the 
user's cron job):


sa-learn --mbox --spam /home/[username]/mail/spam

If something is flagged as spam and it's not supposed to be, I have them 
copy it to the ham folder and I run (also via cron job):


sa-learn --mbox --ham /home/[username]/mail/spam

For my email account, I've used my inbox and various other folders to 
train Bayes in the past (although it's definitely been a while since I 
did Bayes maintenance), but I have zero issue nuking my personal Bayes 
data and starting over.


Thoughts?

--
Thomas


Re: My apologies

2023-08-02 Thread Benny Pedersen

Marc skrev den 2023-08-02 22:23:


I like Reindl! Is anyone training spamassassin on his emails??? ;P


why ?, if its good for bayes, why should it be bad at all for humans 
then ?




Re: My apologies

2023-08-02 Thread Benny Pedersen

Thomas Cameron via users skrev den 2023-08-02 21:39:


I'm sorry for posting that.


i just maked a sieve autoreader, so i don't need to read it self, good 
or bad, i don't know :)


no need to sorry loosing mail imho




Re: Really hard-to-filter spam

2023-08-02 Thread David B Funk

On Wed, 2 Aug 2023, Thomas Cameron via users wrote:

Thank you very much. The message that slipped through today was NOT one of 
the ones being discussed in this thread, it was a different format and 
totally different message. I only included it to demonstrate that my server 
was not being rejected for queries as the blocked user intimated. I will dig 
deeper into the --magic and make sure I'm feeding Bayes with spam and ham.


Regardless, if a message has never been seen before and has little correlation 
to earlier messages its Bayes should hit someplace in the 40% to 60% range.


The fact that it hit 00% indicates a strong correlation to lots of ham (or 
something is screwy with your Bayes).



--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


RE: My apologies

2023-08-02 Thread Marc
> 
> > I've blocked him on my mail server, as well.
> 
> Reindl now and then says something useful, but as you have noticed his
> people skills are somewhere in the negative 200 score level. I don't
> know
> that I'd block him, but you do need to take anything he says witha few
> horselicks of salt.

I like Reindl! Is anyone training spamassassin on his emails??? ;P


Re: My apologies

2023-08-02 Thread Loren Wilton

I've blocked him on my mail server, as well.


Reindl now and then says something useful, but as you have noticed his 
people skills are somewhere in the negative 200 score level. I don't know 
that I'd block him, but you do need to take anything he says witha few 
horselicks of salt.




Re: My apologies

2023-08-02 Thread Antony Stone
On Wednesday 02 August 2023 at 21:39:31, Thomas Cameron via users wrote:

> I was notified privately that Reindl Harald is blocked on this list. I 
> replied to him and accidentally polluted the list with more of his
> toxicity. I apologize, and I've blocked him on my mail server, as well.

We've all had to learn about him (sometimes on several lists) at some time or 
other.  Thanks for the apology, but his attitude is his own, and you've done 
nothing to cause that.  He responds to almost everybody in the same anti-
social (to put it mildly) manner.

Don't worry about it - just carry on with talking to reasonable people 
instead.


Antony.

-- 
If you were ploughing a field, which would you rather use - two strong oxen or 
1024 chickens?

 - Seymour Cray, pioneer of supercomputing

   Please reply to the list;
 please *don't* CC me.


Re: Really hard-to-filter spam

2023-08-02 Thread Thomas Cameron via users




On 8/2/23 14:32, Dave Funk wrote:

On Wed, 2 Aug 2023, Thomas Cameron via users wrote:

Wow! What a charming response! You must be a LOT of fun at parties, 
and have lots of friends! 


Please don't feed the troll. There's a reason that Reindl is blocked 
from this list.


I was not aware, and I apologize.



No, I did not get that response. I don't have any of those specific 
spam to sample, as I have not gotten one today. But the last spam I 
got that

slipped through SA had this score:

X-Spam-Status: No, score=-5.1 required=5.0 
tests=BAYES_00,DEAR_SOMETHING,

DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,
HTML_MESSAGE,RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H2,RCVD_IN_PBL,
SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no
So nothing about any tests not working, or queries being rejected. 
Nothing that looks like misconfiguration on my end. I am not saying 
there are
no misconfigurations on my end, but if there are, it's not super 
obvious to me.


The fact that you're getting BAYES_00 on that message indicates that 
Bayes -really- thinks it's ham.
Given that you've trained multiple instances of this kind of message 
to Bayes as spam but it still gets BAYES_00 score means one of two 
things:
1) Either you've got thousands of instances of similar messages that 
were learned as 'ham'
2) or the database that Bayes in your running SA instance is using is 
not the same one that you were doing your training to.


This could be configuration issues or pilot error (using the wrong 
identity when doing the training, training on the wrong machine, etc).


On your SA machine what does the output of "sa-learn --dump magic" 
show you?

(IE how many nspam & nham tokens, what is the newest "atime", etc).

If careful config & log inspection doesn't give clues, try this 
brute-force test.
Shut down your SA, move the directory containing your Bayes database 
out of the way and create a new empty one.

("sa-learn --dump magic" should now show 0 tokens).

Then train a few ham & spam messages (only a dozen or so), recheck the 
--dump magic to see that there are now some tokens in the database but 
not too many.


Restart your SA and watch the log results. If there are fewer than 200 
messages (both ham & spam) in your Bayes database then SA won't use 
it, so make sure that's the case, your new database should be too 
empty for SA to be willing to use it.
So if you -are- getting Bayes scores then that indicates that SA is 
using some database other than what you think it has.


Now start manually training more messages (spam & ham). When you hit 
the 200 count threashold Bayes scores should start showing up in your 
logs.


Good luck.


Thank you very much. The message that slipped through today was NOT one 
of the ones being discussed in this thread, it was a different format 
and totally different message. I only included it to demonstrate that my 
server was not being rejected for queries as the blocked user intimated. 
I will dig deeper into the --magic and make sure I'm feeding Bayes with 
spam and ham.


Thanks for your response, and again, I apologize for leaking that user's 
garbage to the list. I was not aware that he was blocked.


--
Thomas


My apologies

2023-08-02 Thread Thomas Cameron via users
I was notified privately that Reindl Harald is blocked on this list. I 
replied to him and accidentally polluted the list with more of his 
toxicity. I apologize, and I've blocked him on my mail server, as well.


I'm sorry for posting that.

--
Thomas


Re: Really hard-to-filter spam

2023-08-02 Thread Dave Funk

On Wed, 2 Aug 2023, Thomas Cameron via users wrote:


Wow! What a charming response! You must be a LOT of fun at parties, and have lots of 
friends! 


Please don't feed the troll. There's a reason that Reindl is blocked from this 
list.



No, I did not get that response. I don't have any of those specific spam to 
sample, as I have not gotten one today. But the last spam I got that
slipped through SA had this score:

X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,DEAR_SOMETHING,
DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,
HTML_MESSAGE,RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H2,RCVD_IN_PBL,
SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no
So nothing about any tests not working, or queries being rejected. Nothing that 
looks like misconfiguration on my end. I am not saying there are
no misconfigurations on my end, but if there are, it's not super obvious to me.


The fact that you're getting BAYES_00 on that message indicates that Bayes 
-really- thinks it's ham.
Given that you've trained multiple instances of this kind of message to Bayes as 
spam but it still gets BAYES_00 score means one of two things:
1) Either you've got thousands of instances of similar messages that were 
learned as 'ham'
2) or the database that Bayes in your running SA instance is using is not the 
same one that you were doing your training to.


This could be configuration issues or pilot error (using the wrong identity when 
doing the training, training on the wrong machine, etc).


On your SA machine what does the output of "sa-learn --dump magic" show you?
(IE how many nspam & nham tokens, what is the newest "atime", etc).

If careful config & log inspection doesn't give clues, try this brute-force 
test.
Shut down your SA, move the directory containing your Bayes database out of the 
way and create a new empty one.

("sa-learn --dump magic" should now show 0 tokens).

Then train a few ham & spam messages (only a dozen or so), recheck the --dump 
magic to see that there are now some tokens in the database but not too many.


Restart your SA and watch the log results. If there are fewer than 200 messages 
(both ham & spam) in your Bayes database then SA won't use it, so make sure 
that's the case, your new database should be too empty for SA to be willing to 
use it.
So if you -are- getting Bayes scores then that indicates that SA is using some 
database other than what you think it has.


Now start manually training more messages (spam & ham). When you hit the 200 
count threashold Bayes scores should start showing up in your logs.


Good luck.

--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Really hard-to-filter spam

2023-08-02 Thread Thomas Cameron via users

On 8/2/23 13:28, Reindl Harald wrote:
then i bet you have the same "RCVD_IN_ZEN_BLOCKED_OPENDNS" as the OP 
which means you are not capable to operate a mailserver


https://www.spamhaus.org/returnc/pub/

throwen against our spamfilter it would be blocked without any 
question - above 8.0 points the spamass-milter rejects


Content analysis details:   (32.3 points, 5.5 required)

 pts rule name  description
 -- 
--

 1.0 CUST_DNSBL_26_UCE2 RBL: dnsbl-uce-2.thelounge.net
    (dnsbl-2.uceprotect.net)
   [60.176.201.72 listed in 
dnsbl-uce-2.thelounge.net]

 6.5 CUST_DNSBL_4_ZEN_PBL   RBL: zen.spamhaus.org (pbl.spamhaus.org)
    [60.176.201.72 listed in zen.spamhaus.org]
 5.5 CUST_DNSBL_6_ZEN_XBL   RBL: zen.spamhaus.org (xbl.spamhaus.org)
 1.0 CUST_DNSBL_25_NSZONES  RBL: bl.nszones.com
    [60.176.201.72 listed in bl.nszones.com]
 5.5 BAYES_80   BODY: Bayes spam probability is 80 to 95%
    [score: 0.9084]
 0.1 HK_RANDOM_ENVFROM  Envelope sender username looks random
 0.1 HK_RANDOM_FROM From username looks random
 6.5 CUST_DNSBL_2_SORBS_DUL RBL: dnsbl.sorbs.net
    (dul.dnsbl.sorbs.net)
    [60.176.201.72 listed in dnsbl.sorbs.net]
 0.0 SPF_HELO_NONE  SPF: HELO does not publish an SPF Record
 0.1 SPF_NONE   SPF: sender does not publish an SPF Record
 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.1 TVD_SPACE_RATIO    No description available.
 2.5 RDNS_NONE  Delivered to internal network by a host 
with no rDNS

-0.0 T_SCC_BODY_TEXT_LINE   No description available.
 0.5 INVALID_MSGID  Message-Id is not valid, according to RFC 
2822

 2.5 TVD_SPACE_RATIO_MINFP  Space ratio (vertical text obfuscation?)
 0.5 BOGOFILTER_PROB_SPAM   BOGOFILTER: No description available.


Wow! What a charming response! You must be a LOT of fun at parties, and 
have lots of friends! 


No, I did not get that response. I don't have any of those specific spam 
to sample, as I have not gotten one today. But the last spam I got that 
slipped through SA had this score:


X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,DEAR_SOMETHING,
DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,
HTML_MESSAGE,RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H2,RCVD_IN_PBL,
SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no

So nothing about any tests not working, or queries being rejected. 
Nothing that looks like misconfiguration on my end. I am not saying 
there are no misconfigurations on my end, but if there are, it's not 
super obvious to me.


Cheers!
--
Thomas

Re: Really hard-to-filter spam

2023-08-02 Thread Thomas Cameron via users

On 7/28/23 00:23, Bill Cole wrote:
1. There are milters/content-filters that decode Base64 message parts 
(amavisd-new, mimedefang, etc) for processing by SA.
2.  There are still sufficiently unique items: First-Name-Only, 
Mixed-Case word in the Subject (NLP modeling), and a Base-64 encoded 
HTML attachment (w/ UTF-8 encoding no less).  Combined in a Meta 
rule, these innocuous items will likely hit with good accuracy even 
without Base64 decoding.


Umm, unless I'm really missing something here the usual SA processing 
decodes such body stuff (QP, Base64, etc) and feeds the "cleaned" 
text to the rule processing engine.


Correct. It has nothing to do with the calling glue.

You have to work hard to get matches done on the raw stuff if you 
want to do special rule matching on the un-decoded body.


Correct. That should only be needed in rare cases where you're looking 
for a pattern in a non-text part.


I'm not sure why the OP's rule didn't match the target message, but it 
is NOT because of the Base64 encoding of parts with the 'text' primary 
MIME type. If I had to guess, I'd look for invisible characters hidden 
in the text (e.g. Unicode "zero width non-joiner" marks and the like) 
that break the pattern and for lookalike non-ASCII characters (often 
Cyrillic or Greek) in the target string.


I am seeing the same issue. I get those same emails, with that 
132.1532.1334 string or similar. SA is definitely not catching them, 
even though I dump them into my spam folder and run sa-learn --spam 
against them day after day. How can I check to see if it's actually 
decoding the base64? Or is that just a fact? It seems incredibly weird 
that I get these things every day, I mark them as spam every day, and 
they never hit more than a couple of points on the spam scale.


Thomas