Re: CONTENT_AFTER_HTML: better not discuss formatting!!
On Tue, 8 Feb 2022, Loren Wilton wrote: Are you talking about the use of m'' as the regex delimiter? Yes. It will probably work just fine for the foreseeable future, as long as the input validation of rules files is lenient. I think you may have a very hard time removing the m matching delimiters from SA. I suspect there are at least hundreds of rules like that in the release database. I have about a hundred local rules of my own that use that. Indeed. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Journalism is about covering important stories. With a pillow, until they stop moving. -- David Burge --- 74 more days working to pay your (average) annual US tax bill before you're finally working for yourself.
Re: CONTENT_AFTER_HTML: better not discuss formatting!!
Are you talking about the use of m'' as the regex delimiter? Yes. It will probably work just fine for the foreseeable future, as long as the input validation of rules files is lenient. I think you may have a very hard time removing the m matching delimiters from SA. I suspect there are at least hundreds of rules like that in the release database. I have about a hundred local rules of my own that use that. Any time I have more than one backslash in a pattern, I use an alternate delimiter (usually single quote) so that I don't have to escape all the backslashes in the rule body. I'm not a fan of obfuscated rule bodies where it is impossible to tell what it is intended to match. My experience is that any time you have to write or \\ multiple times in a rule body, you are almost guaranteed to get the number of backslahses wrong, and the rule won't work. But of course it may work in some cases (like the one you used to test it) while not working in general. I don't have time in my life to deal with that sort of thing. It caused me enough grief when I started writing rules 20 years ago, which is why I started using m'. BTW, that particular rule dates from RulesEmporium days, which was what, 2005 or so? Loren
Re: CONTENT_AFTER_HTML: better not discuss formatting!!
On 2022-02-08 at 13:14:06 UTC-0500 (Tue, 8 Feb 2022 13:14:06 -0500) Kris Deugau is rumored to have said: [...] > Are you talking about the use of m'' as the regex delimiter? Yes. It will probably work just fine for the foreseeable future, as long as the input validation of rules files is lenient. It isn't beyond the realm of possibility that someday we'll tighten up syntax checking. We've had security issues in the past which involved the hypothetical potential to sneak in malicious code via rules. I don't expect that we'll have another one bad enough to make a rewrite of the config parser justified, but it could happen, and I don't think we'd design it today as it was done 20 years ago. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Re: CONTENT_AFTER_HTML: better not discuss formatting!!
Bill Cole wrote: On 2022-02-08 at 04:28:16 UTC-0500 (Tue, 8 Feb 2022 01:28:16 -0800) Loren Wilton is rumored to have said: No, I added that after observing multiple spams with random garbage after the closing HTML tag in the HTML body part. Presumably it was an attempt at Bayes poison, checksum avoidance, or some other filter evasion technique. I'll tighten it up. FWIW, here is the rule I use. It obviously could be better, but I haven't noticed that it misfires. full __GOODEHTML1 m''i full __GOODEHTML2 m'(?:\s|=0A){0,50}(?:$|--|=)'is # stop on mime ending boundary TANGENTIAL: I would advise against using such alternative regex syntax in rules. As you obviously figured out, you CAN (for now...) use any valid Perl syntax for writing a regex match, but I do not believe that we want to bless that as something which will never break. Maybe it's just inexperience with deep regex voodoo, but I'm not seeing anything odd in those. Are you talking about the use of m'' as the regex delimiter? -kgd
Re: CONTENT_AFTER_HTML: better not discuss formatting!!
On 2022-02-08 at 04:28:16 UTC-0500 (Tue, 8 Feb 2022 01:28:16 -0800) Loren Wilton is rumored to have said: >> No, I added that after observing multiple spams with random garbage after >> the closing HTML tag in the HTML body part. Presumably it was an attempt at >> Bayes poison, checksum avoidance, or some other filter evasion technique. >> >> I'll tighten it up. > > FWIW, here is the rule I use. It obviously could be better, but I haven't > noticed that it misfires. > > full __GOODEHTML1 m''i > > full __GOODEHTML2 m'(?:\s|=0A){0,50}(?:$|--|=)'is # stop on mime > ending boundary TANGENTIAL: I would advise against using such alternative regex syntax in rules. As you obviously figured out, you CAN (for now...) use any valid Perl syntax for writing a regex match, but I do not believe that we want to bless that as something which will never break. -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Re: CONTENT_AFTER_HTML: better not discuss formatting!!
John Hardin writes: > On Mon, 7 Feb 2022, Greg Troxel wrote: > >> and then I got a reply back with the content he was trying to send etc. >> But, it had: >> >> * 2.5 CONTENT_AFTER_HTML More content after HTML close tag >> >> but one was only text/plain and I could see nothing wrong. reading >> 72_active.cf I found: >> >> rawbody__CONTENT_AFTER_HTML/<\/htnl>\s*[a-z0-9]/i >> which fires on a text/plain part that discusses html formatting! > > Ah, I'll see if I can add something to that so it only fires when > there's an actual HTML body part. Thanks for the report. > > Pity there's not an "htmlbody" rule type... Agreed - I think the way you are trying to tighten is correct. signature.asc Description: PGP signature
Re: CONTENT_AFTER_HTML: better not discuss formatting!!
No, I added that after observing multiple spams with random garbage after the closing HTML tag in the HTML body part. Presumably it was an attempt at Bayes poison, checksum avoidance, or some other filter evasion technique. I'll tighten it up. FWIW, here is the rule I use. It obviously could be better, but I haven't noticed that it misfires. full __GOODEHTML1 m''i full __GOODEHTML2 m'(?:\s|=0A){0,50}(?:$|--|=)'is # stop on mime ending boundary meta LW_BADEHTML1 (__GOODEHTML1 && !__GOODEHTML2) describe LW_BADEHTML1 Bad ending - something after score LW_BADEHTML1 1
Re: CONTENT_AFTER_HTML: better not discuss formatting!!
On Mon, 7 Feb 2022, Loren Wilton wrote: But, it had: * 2.5 CONTENT_AFTER_HTML More content after HTML close tag but one was only text/plain and I could see nothing wrong. reading 72_active.cf I found: rawbody__CONTENT_AFTER_HTML/<\/htnl>\s*[a-z0-9]/i > which fires on a text/plain part that discusses html formatting! Note you show __CONTENT_AFTER_HTML and CONTENT_AFTER_HTML, which are not the same rule. I suspect the meta for CONTENT_AFTER_HTML contains some other things that should in theory make it not hit in this case. I've personally never seen this rule hit, and didn't know it existed. Are you sure it isn't a local rule? I have a rule of my own that gives 1 point for extra trash after the /html end tag. I see it frequently on spam and UCE that has a tracking tag in the HTML section after the official end of the html. No, I added that after observing multiple spams with random garbage after the closing HTML tag in the HTML body part. Presumably it was an attempt at Bayes poison, checksum avoidance, or some other filter evasion technique. I'll tighten it up. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- You do not examine legislation in the light of the benefits it will convey if properly administered, but in the light of the wrongs it would do and the harms it would cause if improperly administered. -- Lyndon B. Johnson --- 5 days until Abraham Lincoln's and Charles Darwin's 213th Birthdays
Re: CONTENT_AFTER_HTML: better not discuss formatting!!
On Mon, 7 Feb 2022, Greg Troxel wrote: and then I got a reply back with the content he was trying to send etc. But, it had: * 2.5 CONTENT_AFTER_HTML More content after HTML close tag but one was only text/plain and I could see nothing wrong. reading 72_active.cf I found: rawbody__CONTENT_AFTER_HTML/<\/htnl>\s*[a-z0-9]/i which fires on a text/plain part that discusses html formatting! Ah, I'll see if I can add something to that so it only fires when there's an actual HTML body part. Thanks for the report. Pity there's not an "htmlbody" rule type... -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- USMC Rules of Gunfighting #2: Anything worth shooting is worth shooting twice. Ammo is cheap. Your life is expensive. --- 5 days until Abraham Lincoln's and Charles Darwin's 213th Birthdays
Re: CONTENT_AFTER_HTML: better not discuss formatting!!
But, it had: * 2.5 CONTENT_AFTER_HTML More content after HTML close tag but one was only text/plain and I could see nothing wrong. reading 72_active.cf I found: rawbody__CONTENT_AFTER_HTML/<\/htnl>\s*[a-z0-9]/i > which fires on a text/plain part that discusses html formatting! Note you show __CONTENT_AFTER_HTML and CONTENT_AFTER_HTML, which are not the same rule. I suspect the meta for CONTENT_AFTER_HTML contains some other things that should in theory make it not hit in this case. I've personally never seen this rule hit, and didn't know it existed. Are you sure it isn't a local rule? I have a rule of my own that gives 1 point for extra trash after the /html end tag. I see it frequently on spam and UCE that has a tracking tag in the HTML section after the official end of the html. Loren
CONTENT_AFTER_HTML: better not discuss formatting!!
(Instances of html have been changed to htnl in this message to avoid tripping the rule I'm talking about.) A legit message arrived at my server, for me and another user, and it scored 8 for them and I think about 11 for me. This is really unusual. The big issues were: Sent by sendgrid: points from KAM and from URIBL_GREY both, each reasonable separately and I think URIBL_GREY newly lists sendgrid. From: was someone's (class teacher) gmail address, but it got sent out via sendgrid via a schoool, and there was no DKIM, so it lit up all sorts of FREEMAIL_FORGED, From:/env mismatch with freemail, ought to have DKIM from google and doesn't. So I wrote to the person because they probably had no idea, and exlained the above and added some other "deliverabilty hygiene" :-) comments: > with more minor issues: > >The message is html only, rather than also having text/plain. > >The message body doesn't have enclosing tags, so it is >malformed. and then I got a reply back with the content he was trying to send etc. But, it had: * 2.5 CONTENT_AFTER_HTML More content after HTML close tag but one was only text/plain and I could see nothing wrong. reading 72_active.cf I found: rawbody__CONTENT_AFTER_HTML/<\/htnl>\s*[a-z0-9]/i which fires on a text/plain part that discusses html formatting! So I'll be reducing that score... signature.asc Description: PGP signature