Re: Really hard-to-filter spam
On 8/2/23 15:52, David B Funk wrote: Regardless, if a message has never been seen before and has little correlation to earlier messages its Bayes should hit someplace in the 40% to 60% range. The fact that it hit 00% indicates a strong correlation to lots of ham (or something is screwy with your Bayes). OK, here's what I got just now: [thomas.cameron@mail-east ~]$ sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 41449 0 non-token data: nspam 0.000 0 49720 0 non-token data: nham 0.000 0 162741 0 non-token data: ntokens 0.000 0 1689089541 0 non-token data: oldest atime 0.000 0 1691009577 0 non-token data: newest atime 0.000 0 1691007146 0 non-token data: last journal sync atime 0.000 0 1690991018 0 non-token data: last expiry atime 0.000 01382400 0 non-token data: last expire atime delta 0.000 0 13879 0 non-token data: last expire reduction count I can absolutely re-train Bayes. I am kind of an email pack-rat, so I have over a gig of saved known good emails in various folders. I have SA set up so that emails are scanned individually on a per user basis via procmail rule: [thomas.cameron@mail-east ~]$ head .procmailrc MAILDIR=$HOME/mail LOGFILE=$MAILDIR/procmail.log :0fw: spamassassin.lock * < 512000 | spamassassin I have the users move spam to an imap folder, and then run (via the user's cron job): sa-learn --mbox --spam /home/[username]/mail/spam If something is flagged as spam and it's not supposed to be, I have them copy it to the ham folder and I run (also via cron job): sa-learn --mbox --ham /home/[username]/mail/spam For my email account, I've used my inbox and various other folders to train Bayes in the past (although it's definitely been a while since I did Bayes maintenance), but I have zero issue nuking my personal Bayes data and starting over. Thoughts? -- Thomas
Re: My apologies
Marc skrev den 2023-08-02 22:23: I like Reindl! Is anyone training spamassassin on his emails??? ;P why ?, if its good for bayes, why should it be bad at all for humans then ?
Re: My apologies
Thomas Cameron via users skrev den 2023-08-02 21:39: I'm sorry for posting that. i just maked a sieve autoreader, so i don't need to read it self, good or bad, i don't know :) no need to sorry loosing mail imho
Re: Really hard-to-filter spam
On Wed, 2 Aug 2023, Thomas Cameron via users wrote: Thank you very much. The message that slipped through today was NOT one of the ones being discussed in this thread, it was a different format and totally different message. I only included it to demonstrate that my server was not being rejected for queries as the blocked user intimated. I will dig deeper into the --magic and make sure I'm feeding Bayes with spam and ham. Regardless, if a message has never been seen before and has little correlation to earlier messages its Bayes should hit someplace in the 40% to 60% range. The fact that it hit 00% indicates a strong correlation to lots of ham (or something is screwy with your Bayes). -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-05491256 Seamans Center, 103 S Capitol St. Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
RE: My apologies
> > > I've blocked him on my mail server, as well. > > Reindl now and then says something useful, but as you have noticed his > people skills are somewhere in the negative 200 score level. I don't > know > that I'd block him, but you do need to take anything he says witha few > horselicks of salt. I like Reindl! Is anyone training spamassassin on his emails??? ;P
Re: My apologies
I've blocked him on my mail server, as well. Reindl now and then says something useful, but as you have noticed his people skills are somewhere in the negative 200 score level. I don't know that I'd block him, but you do need to take anything he says witha few horselicks of salt.
Re: My apologies
On Wednesday 02 August 2023 at 21:39:31, Thomas Cameron via users wrote: > I was notified privately that Reindl Harald is blocked on this list. I > replied to him and accidentally polluted the list with more of his > toxicity. I apologize, and I've blocked him on my mail server, as well. We've all had to learn about him (sometimes on several lists) at some time or other. Thanks for the apology, but his attitude is his own, and you've done nothing to cause that. He responds to almost everybody in the same anti- social (to put it mildly) manner. Don't worry about it - just carry on with talking to reasonable people instead. Antony. -- If you were ploughing a field, which would you rather use - two strong oxen or 1024 chickens? - Seymour Cray, pioneer of supercomputing Please reply to the list; please *don't* CC me.
Re: Really hard-to-filter spam
On 8/2/23 14:32, Dave Funk wrote: On Wed, 2 Aug 2023, Thomas Cameron via users wrote: Wow! What a charming response! You must be a LOT of fun at parties, and have lots of friends! Please don't feed the troll. There's a reason that Reindl is blocked from this list. I was not aware, and I apologize. No, I did not get that response. I don't have any of those specific spam to sample, as I have not gotten one today. But the last spam I got that slipped through SA had this score: X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,DEAR_SOMETHING, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H2,RCVD_IN_PBL, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no So nothing about any tests not working, or queries being rejected. Nothing that looks like misconfiguration on my end. I am not saying there are no misconfigurations on my end, but if there are, it's not super obvious to me. The fact that you're getting BAYES_00 on that message indicates that Bayes -really- thinks it's ham. Given that you've trained multiple instances of this kind of message to Bayes as spam but it still gets BAYES_00 score means one of two things: 1) Either you've got thousands of instances of similar messages that were learned as 'ham' 2) or the database that Bayes in your running SA instance is using is not the same one that you were doing your training to. This could be configuration issues or pilot error (using the wrong identity when doing the training, training on the wrong machine, etc). On your SA machine what does the output of "sa-learn --dump magic" show you? (IE how many nspam & nham tokens, what is the newest "atime", etc). If careful config & log inspection doesn't give clues, try this brute-force test. Shut down your SA, move the directory containing your Bayes database out of the way and create a new empty one. ("sa-learn --dump magic" should now show 0 tokens). Then train a few ham & spam messages (only a dozen or so), recheck the --dump magic to see that there are now some tokens in the database but not too many. Restart your SA and watch the log results. If there are fewer than 200 messages (both ham & spam) in your Bayes database then SA won't use it, so make sure that's the case, your new database should be too empty for SA to be willing to use it. So if you -are- getting Bayes scores then that indicates that SA is using some database other than what you think it has. Now start manually training more messages (spam & ham). When you hit the 200 count threashold Bayes scores should start showing up in your logs. Good luck. Thank you very much. The message that slipped through today was NOT one of the ones being discussed in this thread, it was a different format and totally different message. I only included it to demonstrate that my server was not being rejected for queries as the blocked user intimated. I will dig deeper into the --magic and make sure I'm feeding Bayes with spam and ham. Thanks for your response, and again, I apologize for leaking that user's garbage to the list. I was not aware that he was blocked. -- Thomas
My apologies
I was notified privately that Reindl Harald is blocked on this list. I replied to him and accidentally polluted the list with more of his toxicity. I apologize, and I've blocked him on my mail server, as well. I'm sorry for posting that. -- Thomas
Re: Really hard-to-filter spam
On Wed, 2 Aug 2023, Thomas Cameron via users wrote: Wow! What a charming response! You must be a LOT of fun at parties, and have lots of friends! Please don't feed the troll. There's a reason that Reindl is blocked from this list. No, I did not get that response. I don't have any of those specific spam to sample, as I have not gotten one today. But the last spam I got that slipped through SA had this score: X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,DEAR_SOMETHING, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H2,RCVD_IN_PBL, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no So nothing about any tests not working, or queries being rejected. Nothing that looks like misconfiguration on my end. I am not saying there are no misconfigurations on my end, but if there are, it's not super obvious to me. The fact that you're getting BAYES_00 on that message indicates that Bayes -really- thinks it's ham. Given that you've trained multiple instances of this kind of message to Bayes as spam but it still gets BAYES_00 score means one of two things: 1) Either you've got thousands of instances of similar messages that were learned as 'ham' 2) or the database that Bayes in your running SA instance is using is not the same one that you were doing your training to. This could be configuration issues or pilot error (using the wrong identity when doing the training, training on the wrong machine, etc). On your SA machine what does the output of "sa-learn --dump magic" show you? (IE how many nspam & nham tokens, what is the newest "atime", etc). If careful config & log inspection doesn't give clues, try this brute-force test. Shut down your SA, move the directory containing your Bayes database out of the way and create a new empty one. ("sa-learn --dump magic" should now show 0 tokens). Then train a few ham & spam messages (only a dozen or so), recheck the --dump magic to see that there are now some tokens in the database but not too many. Restart your SA and watch the log results. If there are fewer than 200 messages (both ham & spam) in your Bayes database then SA won't use it, so make sure that's the case, your new database should be too empty for SA to be willing to use it. So if you -are- getting Bayes scores then that indicates that SA is using some database other than what you think it has. Now start manually training more messages (spam & ham). When you hit the 200 count threashold Bayes scores should start showing up in your logs. Good luck. -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-05491256 Seamans Center, 103 S Capitol St. Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
Re: Really hard-to-filter spam
On 8/2/23 13:28, Reindl Harald wrote: then i bet you have the same "RCVD_IN_ZEN_BLOCKED_OPENDNS" as the OP which means you are not capable to operate a mailserver https://www.spamhaus.org/returnc/pub/ throwen against our spamfilter it would be blocked without any question - above 8.0 points the spamass-milter rejects Content analysis details: (32.3 points, 5.5 required) pts rule name description -- -- 1.0 CUST_DNSBL_26_UCE2 RBL: dnsbl-uce-2.thelounge.net (dnsbl-2.uceprotect.net) [60.176.201.72 listed in dnsbl-uce-2.thelounge.net] 6.5 CUST_DNSBL_4_ZEN_PBL RBL: zen.spamhaus.org (pbl.spamhaus.org) [60.176.201.72 listed in zen.spamhaus.org] 5.5 CUST_DNSBL_6_ZEN_XBL RBL: zen.spamhaus.org (xbl.spamhaus.org) 1.0 CUST_DNSBL_25_NSZONES RBL: bl.nszones.com [60.176.201.72 listed in bl.nszones.com] 5.5 BAYES_80 BODY: Bayes spam probability is 80 to 95% [score: 0.9084] 0.1 HK_RANDOM_ENVFROM Envelope sender username looks random 0.1 HK_RANDOM_FROM From username looks random 6.5 CUST_DNSBL_2_SORBS_DUL RBL: dnsbl.sorbs.net (dul.dnsbl.sorbs.net) [60.176.201.72 listed in dnsbl.sorbs.net] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 0.1 SPF_NONE SPF: sender does not publish an SPF Record 0.0 HTML_MESSAGE BODY: HTML included in message 0.1 TVD_SPACE_RATIO No description available. 2.5 RDNS_NONE Delivered to internal network by a host with no rDNS -0.0 T_SCC_BODY_TEXT_LINE No description available. 0.5 INVALID_MSGID Message-Id is not valid, according to RFC 2822 2.5 TVD_SPACE_RATIO_MINFP Space ratio (vertical text obfuscation?) 0.5 BOGOFILTER_PROB_SPAM BOGOFILTER: No description available. Wow! What a charming response! You must be a LOT of fun at parties, and have lots of friends! No, I did not get that response. I don't have any of those specific spam to sample, as I have not gotten one today. But the last spam I got that slipped through SA had this score: X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,DEAR_SOMETHING, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, HTML_MESSAGE,RCVD_IN_DNSWL_HI,RCVD_IN_MSPIKE_H2,RCVD_IN_PBL, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no So nothing about any tests not working, or queries being rejected. Nothing that looks like misconfiguration on my end. I am not saying there are no misconfigurations on my end, but if there are, it's not super obvious to me. Cheers! -- Thomas
Re: Really hard-to-filter spam
On 7/28/23 00:23, Bill Cole wrote: 1. There are milters/content-filters that decode Base64 message parts (amavisd-new, mimedefang, etc) for processing by SA. 2. There are still sufficiently unique items: First-Name-Only, Mixed-Case word in the Subject (NLP modeling), and a Base-64 encoded HTML attachment (w/ UTF-8 encoding no less). Combined in a Meta rule, these innocuous items will likely hit with good accuracy even without Base64 decoding. Umm, unless I'm really missing something here the usual SA processing decodes such body stuff (QP, Base64, etc) and feeds the "cleaned" text to the rule processing engine. Correct. It has nothing to do with the calling glue. You have to work hard to get matches done on the raw stuff if you want to do special rule matching on the un-decoded body. Correct. That should only be needed in rare cases where you're looking for a pattern in a non-text part. I'm not sure why the OP's rule didn't match the target message, but it is NOT because of the Base64 encoding of parts with the 'text' primary MIME type. If I had to guess, I'd look for invisible characters hidden in the text (e.g. Unicode "zero width non-joiner" marks and the like) that break the pattern and for lookalike non-ASCII characters (often Cyrillic or Greek) in the target string. I am seeing the same issue. I get those same emails, with that 132.1532.1334 string or similar. SA is definitely not catching them, even though I dump them into my spam folder and run sa-learn --spam against them day after day. How can I check to see if it's actually decoding the base64? Or is that just a fact? It seems incredibly weird that I get these things every day, I mark them as spam every day, and they never hit more than a couple of points on the spam scale. Thomas