Re: CONTENT_AFTER_HTML: better not discuss formatting!!

2022-02-08 Thread John Hardin

On Tue, 8 Feb 2022, Loren Wilton wrote:


 Are you talking about the use of m'' as the regex delimiter?


 Yes.

 It will probably work just fine for the foreseeable future, as long as the
 input validation of rules files is lenient.


I think you may have a very hard time removing the m matching 
delimiters from SA. I suspect there are at least hundreds of rules like that 
in the release database. I have about a hundred local rules of my own that 
use that.


Indeed.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Journalism is about covering important stories.
  With a pillow, until they stop moving.   -- David Burge
---
 74 more days working to pay your (average) annual US tax bill
 before you're finally working for yourself.


Re: CONTENT_AFTER_HTML: better not discuss formatting!!

2022-02-08 Thread Loren Wilton

Are you talking about the use of m'' as the regex delimiter?


Yes.

It will probably work just fine for the foreseeable future, as long as the 
input validation of rules files is lenient.


I think you may have a very hard time removing the m matching 
delimiters from SA. I suspect there are at least hundreds of rules like that 
in the release database. I have about a hundred local rules of my own that 
use that.


Any time I have more than one backslash in a pattern, I use an alternate 
delimiter (usually single quote) so that I don't have to escape all the 
backslashes in the rule body. I'm not a fan of obfuscated rule bodies where 
it is impossible to tell what it is intended to match. My experience is that 
any time you have to write  or \\ multiple times in a rule body, you 
are almost guaranteed to get the number of backslahses wrong, and the rule 
won't work. But of course it may work in some cases (like the one you used 
to test it) while not working in general.


I don't have time in my life to deal with that sort of thing. It caused me 
enough grief when I started writing rules 20 years ago, which is why I 
started using m'.


BTW, that particular rule dates from RulesEmporium days, which was what, 
2005 or so?


   Loren



Re: CONTENT_AFTER_HTML: better not discuss formatting!!

2022-02-08 Thread Bill Cole
On 2022-02-08 at 13:14:06 UTC-0500 (Tue, 8 Feb 2022 13:14:06 -0500)
Kris Deugau 
is rumored to have said:
[...]
> Are you talking about the use of m'' as the regex delimiter?

Yes.

It will probably work just fine for the foreseeable future, as long as the 
input validation of rules files is lenient.

It isn't beyond the realm of possibility that someday we'll tighten up syntax 
checking. We've had security issues in the past which involved the hypothetical 
potential to sneak in malicious code via rules. I don't expect that we'll have 
another one bad enough to make a rewrite of the config parser justified, but it 
could happen, and I don't think we'd design it today as it was done 20 years 
ago.


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: CONTENT_AFTER_HTML: better not discuss formatting!!

2022-02-08 Thread Kris Deugau

Bill Cole wrote:

On 2022-02-08 at 04:28:16 UTC-0500 (Tue, 8 Feb 2022 01:28:16 -0800)
Loren Wilton 
is rumored to have said:


No, I added that after observing multiple spams with random garbage after the 
closing HTML tag in the HTML body part. Presumably it was an attempt at Bayes 
poison, checksum avoidance, or some other filter evasion technique.

I'll tighten it up.


FWIW, here is the rule I use. It obviously could be better, but I haven't 
noticed that it misfires.

full __GOODEHTML1 m''i

full __GOODEHTML2 m'(?:\s|=0A){0,50}(?:$|--|=)'is # stop on mime ending 
boundary


TANGENTIAL:

I would advise against using such alternative regex syntax in rules. As you 
obviously figured out, you CAN (for now...) use any valid Perl syntax for 
writing a regex match, but I do not believe that we want to bless that as 
something which will never break.


Maybe it's just inexperience with deep regex voodoo, but I'm not seeing 
anything odd in those.


Are you talking about the use of m'' as the regex delimiter?

-kgd


Re: CONTENT_AFTER_HTML: better not discuss formatting!!

2022-02-08 Thread Bill Cole
On 2022-02-08 at 04:28:16 UTC-0500 (Tue, 8 Feb 2022 01:28:16 -0800)
Loren Wilton 
is rumored to have said:

>> No, I added that after observing multiple spams with random garbage after 
>> the closing HTML tag in the HTML body part. Presumably it was an attempt at 
>> Bayes poison, checksum avoidance, or some other filter evasion technique.
>>
>> I'll tighten it up.
>
> FWIW, here is the rule I use. It obviously could be better, but I haven't 
> noticed that it misfires.
>
> full __GOODEHTML1 m''i
>
> full __GOODEHTML2 m'(?:\s|=0A){0,50}(?:$|--|=)'is # stop on mime 
> ending boundary

TANGENTIAL:

I would advise against using such alternative regex syntax in rules. As you 
obviously figured out, you CAN (for now...) use any valid Perl syntax for 
writing a regex match, but I do not believe that we want to bless that as 
something which will never break.


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: CONTENT_AFTER_HTML: better not discuss formatting!!

2022-02-08 Thread Greg Troxel

John Hardin  writes:

> On Mon, 7 Feb 2022, Greg Troxel wrote:
>
>> and then I got a reply back with the content he was trying to send etc.
>> But, it had:
>>
>>  *  2.5 CONTENT_AFTER_HTML More content after HTML close tag
>>
>> but one was only text/plain and I could see nothing wrong.   reading
>> 72_active.cf I found:
>>
>>  rawbody__CONTENT_AFTER_HTML/<\/htnl>\s*[a-z0-9]/i
>> which fires on a text/plain part that discusses html formatting!
>
> Ah, I'll see if I can add something to that so it only fires when
> there's an actual HTML body part. Thanks for the report.
>
> Pity there's not an "htmlbody" rule type...

Agreed - I think the way you are trying to tighten is correct.



signature.asc
Description: PGP signature


Re: CONTENT_AFTER_HTML: better not discuss formatting!!

2022-02-08 Thread Loren Wilton
No, I added that after observing multiple spams with random garbage after 
the closing HTML tag in the HTML body part. Presumably it was an attempt 
at Bayes poison, checksum avoidance, or some other filter evasion 
technique.


I'll tighten it up.


FWIW, here is the rule I use. It obviously could be better, but I haven't 
noticed that it misfires.


full __GOODEHTML1 m''i

full __GOODEHTML2 m'(?:\s|=0A){0,50}(?:$|--|=)'is # stop on mime 
ending boundary


meta LW_BADEHTML1 (__GOODEHTML1 && !__GOODEHTML2)

describe LW_BADEHTML1 Bad ending - something after 

score LW_BADEHTML1 1





Re: CONTENT_AFTER_HTML: better not discuss formatting!!

2022-02-07 Thread John Hardin

On Mon, 7 Feb 2022, Loren Wilton wrote:


 But, it had:

  *  2.5 CONTENT_AFTER_HTML More content after HTML close tag

 but one was only text/plain and I could see nothing wrong.   reading
 72_active.cf I found:

   rawbody__CONTENT_AFTER_HTML/<\/htnl>\s*[a-z0-9]/i
 >
 which fires on a text/plain part that discusses html formatting!


Note you show __CONTENT_AFTER_HTML and CONTENT_AFTER_HTML, which are not the 
same rule. I suspect the meta for CONTENT_AFTER_HTML  contains some other 
things that should in theory make it not hit in this case.


I've personally never seen this rule hit, and didn't know it existed. Are you 
sure it isn't a local rule? I have a rule of my own that gives 1 point for 
extra trash after the /html end tag. I see it frequently on spam and UCE that 
has a tracking tag in the HTML section after the official end of the html.


No, I added that after observing multiple spams with random garbage after 
the closing HTML tag in the HTML body part. Presumably it was an attempt 
at Bayes poison, checksum avoidance, or some other filter evasion 
technique.


I'll tighten it up.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  You do not examine legislation in the light of the benefits it
  will convey if properly administered, but in the light of the
  wrongs it would do and the harms it would cause if improperly
  administered.  -- Lyndon B. Johnson
---
 5 days until Abraham Lincoln's and Charles Darwin's 213th Birthdays


Re: CONTENT_AFTER_HTML: better not discuss formatting!!

2022-02-07 Thread John Hardin

On Mon, 7 Feb 2022, Greg Troxel wrote:


and then I got a reply back with the content he was trying to send etc.
But, it had:

*  2.5 CONTENT_AFTER_HTML More content after HTML close tag

but one was only text/plain and I could see nothing wrong.   reading
72_active.cf I found:

 rawbody__CONTENT_AFTER_HTML/<\/htnl>\s*[a-z0-9]/i
which fires on a text/plain part that discusses html formatting!


Ah, I'll see if I can add something to that so it only fires when there's 
an actual HTML body part. Thanks for the report.


Pity there's not an "htmlbody" rule type...


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  USMC Rules of Gunfighting #2: Anything worth shooting
  is worth shooting twice. Ammo is cheap. Your life is expensive.
---
 5 days until Abraham Lincoln's and Charles Darwin's 213th Birthdays


Re: CONTENT_AFTER_HTML: better not discuss formatting!!

2022-02-07 Thread Loren Wilton

But, it had:

 *  2.5 CONTENT_AFTER_HTML More content after HTML close tag

but one was only text/plain and I could see nothing wrong.   reading
72_active.cf I found:

  rawbody__CONTENT_AFTER_HTML/<\/htnl>\s*[a-z0-9]/i 
 >

which fires on a text/plain part that discusses html formatting!


Note you show __CONTENT_AFTER_HTML and CONTENT_AFTER_HTML, which are not the 
same rule. I suspect the meta for CONTENT_AFTER_HTML  contains some other 
things that should in theory make it not hit in this case.


I've personally never seen this rule hit, and didn't know it existed. Are 
you sure it isn't a local rule? I have a rule of my own that gives 1 point 
for extra trash after the /html end tag. I see it frequently on spam and UCE 
that has a tracking tag in the HTML section after the official end of the 
html.


   Loren



CONTENT_AFTER_HTML: better not discuss formatting!!

2022-02-07 Thread Greg Troxel

(Instances of html have been changed to htnl in this message to
avoid tripping the rule I'm talking about.)

A legit message arrived at my server, for me and another user, and it
scored 8 for them and I think about 11 for me.  This is really unusual.
The big issues were:

  Sent by sendgrid: points from KAM and from URIBL_GREY both, each
  reasonable separately and I think URIBL_GREY newly lists sendgrid.

  From: was someone's (class teacher) gmail address, but it got sent out
  via sendgrid via a schoool, and there was no DKIM, so it lit up all
  sorts of FREEMAIL_FORGED, From:/env mismatch with freemail, ought to
  have DKIM from google and doesn't.

So I wrote to the person because they probably had no idea, and exlained
the above and added some other "deliverabilty hygiene" :-) comments:

> with more minor issues:
>
>The message is html only, rather than also having text/plain.
>
>The message body doesn't have enclosing   tags, so it is
>malformed.

and then I got a reply back with the content he was trying to send etc.
But, it had:

*  2.5 CONTENT_AFTER_HTML More content after HTML close tag

but one was only text/plain and I could see nothing wrong.   reading
72_active.cf I found:

  rawbody__CONTENT_AFTER_HTML/<\/htnl>\s*[a-z0-9]/i 


   
which fires on a text/plain part that discusses html formatting!

So I'll be reducing that score...


signature.asc
Description: PGP signature