Re: CONTENT_AFTER_HTML: better not discuss formatting!!
On Tue, 8 Feb 2022, Loren Wilton wrote: Are you talking about the use of m'' as the regex delimiter? Yes. It will probably work just fine for the foreseeable future, as long as the input validation of rules files is lenient. I think you may have a very hard time removing the m matching delimiters from SA. I suspect there are at least hundreds of rules like that in the release database. I have about a hundred local rules of my own that use that. Indeed. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Journalism is about covering important stories. With a pillow, until they stop moving. -- David Burge --- 74 more days working to pay your (average) annual US tax bill before you're finally working for yourself.
Re: CONTENT_AFTER_HTML: better not discuss formatting!!
Are you talking about the use of m'' as the regex delimiter? Yes. It will probably work just fine for the foreseeable future, as long as the input validation of rules files is lenient. I think you may have a very hard time removing the m matching delimiters from SA. I suspect there are at least hundreds of rules like that in the release database. I have about a hundred local rules of my own that use that. Any time I have more than one backslash in a pattern, I use an alternate delimiter (usually single quote) so that I don't have to escape all the backslashes in the rule body. I'm not a fan of obfuscated rule bodies where it is impossible to tell what it is intended to match. My experience is that any time you have to write or \\ multiple times in a rule body, you are almost guaranteed to get the number of backslahses wrong, and the rule won't work. But of course it may work in some cases (like the one you used to test it) while not working in general. I don't have time in my life to deal with that sort of thing. It caused me enough grief when I started writing rules 20 years ago, which is why I started using m'. BTW, that particular rule dates from RulesEmporium days, which was what, 2005 or so? Loren
Re: CONTENT_AFTER_HTML: better not discuss formatting!!
On 2022-02-08 at 13:14:06 UTC-0500 (Tue, 8 Feb 2022 13:14:06 -0500) Kris Deugau is rumored to have said: [...] > Are you talking about the use of m'' as the regex delimiter? Yes. It will probably work just fine for the foreseeable future, as long as the input validation of rules files is lenient. It isn't beyond the realm of possibility that someday we'll tighten up syntax checking. We've had security issues in the past which involved the hypothetical potential to sneak in malicious code via rules. I don't expect that we'll have another one bad enough to make a rewrite of the config parser justified, but it could happen, and I don't think we'd design it today as it was done 20 years ago. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Re: CONTENT_AFTER_HTML: better not discuss formatting!!
Bill Cole wrote: On 2022-02-08 at 04:28:16 UTC-0500 (Tue, 8 Feb 2022 01:28:16 -0800) Loren Wilton is rumored to have said: No, I added that after observing multiple spams with random garbage after the closing HTML tag in the HTML body part. Presumably it was an attempt at Bayes poison, checksum avoidance, or some other filter evasion technique. I'll tighten it up. FWIW, here is the rule I use. It obviously could be better, but I haven't noticed that it misfires. full __GOODEHTML1 m''i full __GOODEHTML2 m'(?:\s|=0A){0,50}(?:$|--|=)'is # stop on mime ending boundary TANGENTIAL: I would advise against using such alternative regex syntax in rules. As you obviously figured out, you CAN (for now...) use any valid Perl syntax for writing a regex match, but I do not believe that we want to bless that as something which will never break. Maybe it's just inexperience with deep regex voodoo, but I'm not seeing anything odd in those. Are you talking about the use of m'' as the regex delimiter? -kgd
Re: CONTENT_AFTER_HTML: better not discuss formatting!!
On 2022-02-08 at 04:28:16 UTC-0500 (Tue, 8 Feb 2022 01:28:16 -0800) Loren Wilton is rumored to have said: >> No, I added that after observing multiple spams with random garbage after >> the closing HTML tag in the HTML body part. Presumably it was an attempt at >> Bayes poison, checksum avoidance, or some other filter evasion technique. >> >> I'll tighten it up. > > FWIW, here is the rule I use. It obviously could be better, but I haven't > noticed that it misfires. > > full __GOODEHTML1 m''i > > full __GOODEHTML2 m'(?:\s|=0A){0,50}(?:$|--|=)'is # stop on mime > ending boundary TANGENTIAL: I would advise against using such alternative regex syntax in rules. As you obviously figured out, you CAN (for now...) use any valid Perl syntax for writing a regex match, but I do not believe that we want to bless that as something which will never break. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Re: FROM header obfuscation
Frido Otten wrote: Hi All, Recently we're seeing more spam passing our spamfilters using text obfuscating in the FROM header. The problem mainly targets users which are using mail clients like iPhone Mail which are only displaying the display name of the FROM header and not the actual email address which was used, bypassing DKIM measures. For example: From: =?UTF-8?B?0KBvc3RubC5ubCDQoGFra2V0?= This is base64 encoded "Рostnl.nl Рakket" and pretends to come from Postnl, a dutch snailmail company. However the hexadecimal representation of this base64 decoded text differs from that of normal ASCII: Obfuscated: $ printf "Рostnl.nl Рakket" | od -A n -t x1 d0 a0 6f 73 74 6e 6c 2e 6e 6c 20 d0 a0 61 6b 6b 65 74 Plain ASCII: $ printf "Postnl.nl Pakket" | od -A n -t x1 50 6f 73 74 6e 6c 2e 6e 6c 20 50 61 6b 6b 65 74 There is no way to tell the difference with the naked eye. That depends on the font. Many variations do in fact look different, and from some of the FP-approaching "ham" I've seen that abuses this I can only conclude that some marketing person has decided that this is Necessary and Required and the tech folks can Go Suck It. As far as I'm concerned, formatting outside of language accents on characters absolutely does NOT belong in either the From: name or Subject. An "a" in the From: name or Subject absolutely MUST be presented as a US-ASCII "a", and not some extended UTF8 lookalike that's... oo! in *italics*! Naturally the spammers go to various amounts of effort to avoid the ones that are clearly different. Is there any way to detect this type of obfuscation with a spamassassin rule? I have a longish list of rule groups similar to below for different extended UTF8 ASCII-lookalike characters and words. Some are derived from rules discussed on this list within the past year or so. header __SUSP_NAME_CHAR_01 From:name =~ /(?:\xd0[\xa0-\xbf])/ tflags __SUSP_NAME_CHAR_01 multiple maxhits 10 header __SUSP_NAME_CHAR_02 From:name =~ /(?:\xef\xbc[\x80-\xbf]|\xef\xbd[\x80-\xa0])/ tflags __SUSP_NAME_CHAR_02 multiple maxhits 10 meta__SUSP_NAME_CHAR__SUSP_NAME_CHAR_01 + __SUSP_NAME_CHAR_02 metaSUSP_NAME_CHAR_5__SUSP_NAME_CHAR >= 5 describe SUSP_NAME_CHAR_5 5 or more lookalike characters in the From: name score SUSP_NAME_CHAR_51.5 metaSUSP_NAME_CHAR_10 __SUSP_NAME_CHAR >= 10 describe SUSP_NAME_CHAR_10 10 or more lookalike characters in the From: name score SUSP_NAME_CHAR_10 1.75 I've used this tool: https://www.utf8-chartable.de/ with a bit of effort to take an example character and locate the full a-z list of entries for these rules. (Convert individual characters to hex, then flip pages until you've found the fakes. There are many groups.) Single characters are trickier; depending on context I've added rules for individual lookalike characters, or whole words with mixed variants (and an exclusion for pure ASCII) as I see new runs of FNs. -kgd
Re: Emails from gmail.com bypassing Spamassassin scoring
On 2022-02-07 at 13:43:31 UTC-0500 (Mon, 07 Feb 2022 13:43:31 -0500) Chad is rumored to have said: > I have been getting numerous emails lately from various gmail.com accounts. >They are spam or phishing emails and today I got one that had a subject of > RECEIPT 5454 and only a JPG image of an invoice. There was no content in > the email. > > > > It bypassed Spamassassin scoring. Do you know why or what setting I need > to set so EVERY email goes through Spamassassin scoring procedures? > > > > My email server is:mercury2022.mercuryemail.net [...] > Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com > [209.85.214.172]) > > by mercury2022.mercuryemail.net (Postfix) with ESMTPS id > A5F7E8043D4A > > for ; Mon, 7 Feb 2022 10:44:18 -0500 > (EST) OK, so we know that your mail server is running Postfix but not how you've integrated SpamAssassin. There are many possibilities, with 2 independent attributes: 1. Interface to Postfix: a. content_filter setting to pipe mail to a bespoke script (maybe distro-provided) b. milter (amavis, spamass-milter, mimedefang, etc.) c. SMTP Proxy (usually amavis) d. FILTER action in an access map to a bespoke script. e. NONE: Integrated with a downstream delivery agent (e.g. Dovecot LMTP) or MUA. 2. Interface to SA: a. Load Mail::SpamAssassin Perl modules and use them directly b. Use a spamc binary built from the SA distribution to contact a local spamd instance c. Use a spamc binary built from the SA distribution to contact a remote spamd instance d. Use a custom implementation of the spamc protocol to contact a local spamd instance e. Use a custom implementation of the spamc protocol to contact a remote spamd instance f. Run the spamassassin script and handle its output. So, yeah: 30 possible combinations. It is hard to say what is broken without knowing how you have SA working when it works. This sort of problem is never technically in SpamAssassin itself, as SpamAssassin itself doesn't include any software that could act as a gatekeeper. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Re: CONTENT_AFTER_HTML: better not discuss formatting!!
John Hardin writes: > On Mon, 7 Feb 2022, Greg Troxel wrote: > >> and then I got a reply back with the content he was trying to send etc. >> But, it had: >> >> * 2.5 CONTENT_AFTER_HTML More content after HTML close tag >> >> but one was only text/plain and I could see nothing wrong. reading >> 72_active.cf I found: >> >> rawbody__CONTENT_AFTER_HTML/<\/htnl>\s*[a-z0-9]/i >> which fires on a text/plain part that discusses html formatting! > > Ah, I'll see if I can add something to that so it only fires when > there's an actual HTML body part. Thanks for the report. > > Pity there's not an "htmlbody" rule type... Agreed - I think the way you are trying to tighten is correct. signature.asc Description: PGP signature
Re: Errors running SpamAssassin
I'd run "sh -x /etc/cron.daily/spamassassin" to see what command in that file failed. I assume it is the sa-compile command. I got some more results. Here are the steps I made: 1. Remove everything from /var/lib/spamassassin 2. Reinstall spamassassin package 3. Recreate /var/lib/spamassassin/compiled directory with debian-spamd:debian-spamd ownership 4. Run sudo sh -x /etc/cron.daily/spamassassin Here is the output I collected: + CRON=0 + test -f /etc/default/spamassassin + . /etc/default/spamassassin + SPAMD_HOME=/run/spamd/ + OPTIONS=--create-prefs --max-children 5 --helper-home-dir /run/spamd/ --listen /run/spamd/spamd.sock --username debian-spamd -s spamd --allow-tell --timeout-child=30 + PIDFILE=/run/spamd/spamd.pid + NICE=--nicelevel 15 + CRON=1 + test -x /usr/bin/sa-update + test -x /etc/init.d/spamassassin + command -v gpg + [ 1 = 0 ] + [ ! -t 0 ] + umask 022 + env -i LANG=en_GB.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin http_proxy= start-stop-daemon --chuid debian-spamd:debian-spamd --start --exec /usr/bin/sa-update -- --gpghomedir /var/lib/spamassassin/sa-update-keys + env -i LANG=en_GB.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin start-stop-daemon --chuid debian-spamd:debian-spamd --start --exec /usr/bin/spamassassin -- --lint + do_compile + [ -x /usr/bin/re2c -a -x /usr/bin/sa-compile ] + env -i LANG=en_GB.UTF-8 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin start-stop-daemon --chuid debian-spamd:debian-spamd --start --exec /usr/bin/sa-compile -- --quiet chmod: cannot access 'body_0.bs': No such file or directory make: *** [Makefile:465: body_0.bs] Error 1 command 'make PREFIX=/tmp/.spamassassin11033rJy6vLtmp/ignored INSTALLSITEARCH=/var/lib/spamassassin/compiled/5.032/3.004006 >>/tmp/.spamassassin11033rJy6vLtmp/log' failed: exit 2 + runuser -u debian-spamd -- chmod -R go-w,go+rX /var/lib/spamassassin/compiled + reload + which invoke-rc.d + invoke-rc.d --quiet spamassassin status + invoke-rc.d spamassassin reload + [ -d /etc/spamassassin/sa-update-hooks.d ] + run-parts --lsbsysinit /etc/spamassassin/sa-update-hooks.d The error in the middle is the one reported daily through the run of the cron job. Might there be something wrong with my environment? Else, what could be wrong/needs checking? Bernard
FROM header obfuscation
Hi All, Recently we're seeing more spam passing our spamfilters using text obfuscating in the FROM header. The problem mainly targets users which are using mail clients like iPhone Mail which are only displaying the display name of the FROM header and not the actual email address which was used, bypassing DKIM measures. For example: From: =?UTF-8?B?0KBvc3RubC5ubCDQoGFra2V0?= This is base64 encoded "Рostnl.nl Рakket" and pretends to come from Postnl, a dutch snailmail company. However the hexadecimal representation of this base64 decoded text differs from that of normal ASCII: Obfuscated: $ printf "Рostnl.nl Рakket" | od -A n -t x1 d0 a0 6f 73 74 6e 6c 2e 6e 6c 20 d0 a0 61 6b 6b 65 74 Plain ASCII: $ printf "Postnl.nl Pakket" | od -A n -t x1 50 6f 73 74 6e 6c 2e 6e 6c 20 50 61 6b 6b 65 74 There is no way to tell the difference with the naked eye. You can obfuscate text using this online tool: https://obfuscator.uo1.net/ Is there any way to detect this type of obfuscation with a spamassassin rule? Best regards, Frido Otten
Re: CONTENT_AFTER_HTML: better not discuss formatting!!
No, I added that after observing multiple spams with random garbage after the closing HTML tag in the HTML body part. Presumably it was an attempt at Bayes poison, checksum avoidance, or some other filter evasion technique. I'll tighten it up. FWIW, here is the rule I use. It obviously could be better, but I haven't noticed that it misfires. full __GOODEHTML1 m''i full __GOODEHTML2 m'(?:\s|=0A){0,50}(?:$|--|=)'is # stop on mime ending boundary meta LW_BADEHTML1 (__GOODEHTML1 && !__GOODEHTML2) describe LW_BADEHTML1 Bad ending - something after score LW_BADEHTML1 1