Re: A few noob questions

2020-12-20 Thread Bill Cole

On 20 Dec 2020, at 0:38, Alan wrote:

Thanks Bill. I know very little about Perl, so while I saw the 
reference to Mail::SpamAssassin::Conf without the "perldoc" in front 
of it, I had no clue what to do with that information.


Sorry about that. The same info is at 
https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html




On 2020-12-20 00:18, Bill Cole wrote:

On 19 Dec 2020, at 23:39, Alan wrote:

Please forgive me if these are easy/common questions. I have done 
some searching and haven't found any clear answers.


I'm running SpamAssassin 3.4.4 in a cPanel environment.

1. What is the smallest increment for a rule score? I see some 
indications that it's 0.1, others seem to say it is 0.01. Can I go 
to 0.001? Lower?


Any number that Perl understands will work but very small scores are 
pointless.  So if you really want to score a rule at 12.34e-56 you 
can.


The reason for asking is that I want to use SpamAssassin to flag 
some things that are suspicious but only when other conditions are 
met for specific users. I'd like to have SA insert the rule text, 
eg. LOCAL_SOME_RULE so that I can have an exim filter check for a 
specific form of to address plus this rule match before removing the 
message. But at the same time I don't want messages that match this 
rule generate false positives for other users.


Generally 0.01 or -0.01 is adequately small for such purposes.

2. I would like to match against some suspicious URLs that contain 
long sequences of random characters, but only have the rule match if 
I find multiple URLs that follow the same pattern. Normally I would 
use /(some-regex){5}/ but it seems that the rawbody command only 
looks at smaller chunks of the message (in this case the spammer is 
sending messages that are in the 11KB range and I have adjusted exim 
to pass enough in $message_body to capture enough URLs to fire a 
rule).


Is it possible to configure SA to look at bigger chunks? 8 KB or 
even 16 KB would work. If not, is there a way to write a rule that 
counts the total number of matches of a regex against the raw body?


A rule can be allowed to match multiple times, as described in the 
documentation (perldoc Mail::SpamAssassin::Conf.) Here's the example 
provided there:


  uri  __KAM_COUNT_URIS /^./
  tflags   __KAM_COUNT_URIS multiple 
maxhits=16
  describe __KAM_COUNT_URIS A multiple match 
used to count URIs in a message


  meta __KAM_HAS_0_URIS (__KAM_COUNT_URIS == 
0)
  meta __KAM_HAS_1_URIS (__KAM_COUNT_URIS >= 
1)
  meta __KAM_HAS_2_URIS (__KAM_COUNT_URIS >= 
2)
  meta __KAM_HAS_3_URIS (__KAM_COUNT_URIS >= 
3)
  meta __KAM_HAS_4_URIS (__KAM_COUNT_URIS >= 
4)
  meta __KAM_HAS_5_URIS (__KAM_COUNT_URIS >= 
5)
  meta __KAM_HAS_10_URIS (__KAM_COUNT_URIS 
>= 10)
  meta __KAM_HAS_15_URIS (__KAM_COUNT_URIS 
>= 15)








--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: A few noob questions

2020-12-20 Thread Alan

On 2020-12-20 21:11, John Hardin wrote:

On Sun, 20 Dec 2020, Alan wrote:

n.b.: you're not subscribed to the list from 
netbeans.5zc...@ambitonline.com but I pushed it through moderation. If 
you're going to post regularly from that address you should register 
it as an alternate.


Oh nuts. I always set up a forwarder per list with random suffix, just 
so that if it ever leaks out I can change the suffix and beat the 
harvesters. I picked the wrong identity to send from. Guess my Netbeans 
address now needs an update. Self-inflicted wounds. :(


I do a lot of rule dev so I have a dedicated test environment. I can't 
say whether --cf would work, I've never tried it. Seems plausible.


You'll also want "--debug area=all,rules,rules-all,message,uri" to see 
the hits in the log output.



Perfect. Thanks!


Re: A few noob questions

2020-12-20 Thread John Hardin

On Sun, 20 Dec 2020, Alan wrote:

n.b.: you're not subscribed to the list from 
netbeans.5zc...@ambitonline.com but I pushed it through moderation. If 
you're going to post regularly from that address you should register it as 
an alternate.



From the mailing list help:


You can start a subscription for an alternate address,
for example "john@host.domain", just add a hyphen and your
address (with '=' instead of '@') after the command word:




Many thanks for your help.

On 2020-12-20 15:26, John Hardin wrote:

On Sat, 19 Dec 2020, Alan wrote:

The reason for asking is that I want to use SpamAssassin to flag some 
things that are suspicious but only when other conditions are met for 
specific users. I'd like to have SA insert the rule text, eg. 
LOCAL_SOME_RULE so that I can have an exim filter check for a specific 
form of to address plus this rule match before removing the message.


You should be able to do that purely in SA; it's a tad more difficult if 
you want to match the envelope to address rather than the To: header. If 
you want to reliably match the envelope to address you'd need to have it 
recorded in a Received header (either the one that your MTA generates or 
the one that some trusted MTA prior to your MTA generates).


Agreed, ideally this is something I can stick into a KB article and have 
afflicted users implement on their own. I'd like to keep system-wide 
modifications to a minimum. A user's exim filters also move when we transfer 
an account to another server, so as long as there's a common rule set, not 
having to adjust SA configuration is a benefit.


Ah, ok. That makes sense.


Basically what I have now is this:

uri __LCL_SUSPECT_LINK1 /target_pattern_1/i
tflags __LCL_SUSPECT_LINK1 multiple maxhits=5
uri __LCL_SUSPECT_LINK2 /target_pattern_2/i
tflags __LCL_SUSPECT_LINK2 multiple maxhits=5
meta LCL_MANY_SUSPECT_LINKS __LCL_SUSPECT_LINK1 && __LCL_SUSPECT_LINK2 && 
rules_matching(__LCL_SUSPECT_LINK?) > 5


No, it doesn't need to be that complex. This is all you need:

meta LCL_MANY_SUSPECT_LINKS __LCL_SUSPECT_LINK1 > 4 && __LCL_SUSPECT_LINK2 > 4

Treat the rule names as variables having their value = # hits. Mostly 
you're doing logical comparisons (R1 && R2 && !R3) but math is totally 
acceptable as well, e.g. (R1 + R2 + R3 > 1) for an "any two out of three" 
meta rule.


...so, if you want to count multiple hits across several rules, perhaps:

meta LCL_MANY_SUSPECT_LINKS (__LCL_SUSPECT_LINK1 + __LCL_SUSPECT_LINK2) > 4

Also note that with "maxhits=5" the number of times the rule will hit will 
be at most 5, so "> 5" will never match.


One more noob question. Can I test a rule without messing with the production 
environment by using


spamassassin -t -cf='include myrule.cf' path

or should I build a test environment?


I do a lot of rule dev so I have a dedicated test environment. I can't say 
whether --cf would work, I've never tried it. Seems plausible.


You'll also want "--debug area=all,rules,rules-all,message,uri" to see 
the hits in the log output.



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
  does quite what I want. I wish Christopher Robin was here."
   -- Peter da Silva in a.s.r
---
 5 days until Christmas


Re: A few noob questions

2020-12-20 Thread Alan

Many thanks for your help.

On 2020-12-20 15:26, John Hardin wrote:

On Sat, 19 Dec 2020, Alan wrote:

The reason for asking is that I want to use SpamAssassin to flag some 
things that are suspicious but only when other conditions are met for 
specific users. I'd like to have SA insert the rule text, eg. 
LOCAL_SOME_RULE so that I can have an exim filter check for a 
specific form of to address plus this rule match before removing the 
message.


You should be able to do that purely in SA; it's a tad more difficult 
if you want to match the envelope to address rather than the To: 
header. If you want to reliably match the envelope to address you'd 
need to have it recorded in a Received header (either the one that 
your MTA generates or the one that some trusted MTA prior to your MTA 
generates).


Agreed, ideally this is something I can stick into a KB article and have 
afflicted users implement on their own. I'd like to keep system-wide 
modifications to a minimum. A user's exim filters also move when we 
transfer an account to another server, so as long as there's a common 
rule set, not having to adjust SA configuration is a benefit.


Basically what I have now is this:

uri __LCL_SUSPECT_LINK1 /target_pattern_1/i
tflags __LCL_SUSPECT_LINK1 multiple maxhits=5
uri __LCL_SUSPECT_LINK2 /target_pattern_2/i
tflags __LCL_SUSPECT_LINK2 multiple maxhits=5
meta LCL_MANY_SUSPECT_LINKS __LCL_SUSPECT_LINK1 && __LCL_SUSPECT_LINK2 
&& rules_matching(__LCL_SUSPECT_LINK?) > 5

score LCL_MANY_SUSPECT_LINKS 0.001
describe LCL_MANY_SUSPECT_LINKS More than 5 links match a suspected spam 
pattern
As for long sequences of random characters - that's FP-prone. It's 
difficult to detect *random* in a simple RE. A long string of 
characters from a given set, easy. Characteristics about that string? 
complicated. A rule like that might potentially hit on legitimate (for 
values of "legitimate") tracking analysis URIs or caching URIs, unless 
there is some kind of uncommon pattern to it that you can discern and 
look for in the RE.


No kidding. I've seen this specific pattern in many a spam message over 
the years so I suspect it's particularly FP vulnerable. If there was a 
regex rule for "matches English word" I could nail them with ease. OTOH 
my regex skills are pretty decent. Finding the two common patterns and 
checking that at least one of each is there will hopefully eliminate 
messages that consistently only use one form, eliminating a range of FPs.


If I can use the "many suspect links" match along with a few other 
indicators, including that this particular [expletive] makes the message 
look like it comes from a mailing list, I think I can kill their spew. 
I'm seeing upwards of 20 messages per day per user from this source, but 
they're rotating through junk data center IP addresses and disposable 
mail server identities daily. This is war.


One more noob question. Can I test a rule without messing with the 
production environment by using


spamassassin -t -cf='include myrule.cf' path

or should I build a test environment?



Re: A few noob questions

2020-12-20 Thread John Hardin

On Sat, 19 Dec 2020, Alan wrote:

1. What is the smallest increment for a rule score? I see some indications 
that it's 0.1, others seem to say it is 0.01. Can I go to 0.001? Lower?


As Bill said, anything works. Zero does disable the rule; a score of 0.001 
is generally termed "informative" - you want to include it in the hits 
output so that you know that the rule hits, but you don't want it (by 
itself) to affect the score. See, for example, LOTSA_MONEY.


The reason for asking is that I want to use SpamAssassin to flag some things 
that are suspicious but only when other conditions are met for specific 
users. I'd like to have SA insert the rule text, eg. LOCAL_SOME_RULE so that 
I can have an exim filter check for a specific form of to address plus this 
rule match before removing the message.


You should be able to do that purely in SA; it's a tad more difficult if 
you want to match the envelope to address rather than the To: header. If 
you want to reliably match the envelope to address you'd need to have it 
recorded in a Received header (either the one that your MTA generates or 
the one that some trusted MTA prior to your MTA generates).


You'd make LOCAL_SOME_RULE an unscored subrule by prepending two 
underscores: __LCL_SOME_RULE, and then you'd develop some subrule(s) to 
hit on the specific form of to address(es) you're interested in. Then 
these can be combined in a scored meta rule:


  meta  LCL_POISON_01  __LCL_SOME_RULE && (__LCL_SUSP_TO_01 || __LCL_SUSP_TO_02)
  score LCL_POISON_01  10.000

But at the same time I don't want messages that match this rule generate 
false positives for other users.


If you've done the __LCL_SUSP_TO_* rule(s) properly that shouldn't happen. 
You can set the score to informative while testing it.


2. I would like to match against some suspicious URLs that contain long 
sequences of random characters, but only have the rule match if I find 
multiple URLs that follow the same pattern.


Bill answered that adequately.

One comment on his answer:

  describe __KAM_COUNT_URIS

Subrules never appear in the hits output so a description on them is only 
for internal documentation purposes; a regular #comment would work just as 
well for that.


As for long sequences of random characters - that's FP-prone. It's 
difficult to detect *random* in a simple RE. A long string of characters 
from a given set, easy. Characteristics about that string? complicated. A 
rule like that might potentially hit on legitimate (for values of 
"legitimate") tracking analysis URIs or caching URIs, unless there is some 
kind of uncommon pattern to it that you can discern and look for in the 
RE.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
  does quite what I want. I wish Christopher Robin was here."
   -- Peter da Silva in a.s.r
---
 5 days until Christmas


Re: A few noob questions

2020-12-19 Thread Alan
Thanks Bill. I know very little about Perl, so while I saw the reference 
to Mail::SpamAssassin::Conf without the "perldoc" in front of it, I had 
no clue what to do with that information.


On 2020-12-20 00:18, Bill Cole wrote:

On 19 Dec 2020, at 23:39, Alan wrote:

Please forgive me if these are easy/common questions. I have done 
some searching and haven't found any clear answers.


I'm running SpamAssassin 3.4.4 in a cPanel environment.

1. What is the smallest increment for a rule score? I see some 
indications that it's 0.1, others seem to say it is 0.01. Can I go to 
0.001? Lower?


Any number that Perl understands will work but very small scores are 
pointless.  So if you really want to score a rule at 12.34e-56 you can.


The reason for asking is that I want to use SpamAssassin to flag some 
things that are suspicious but only when other conditions are met for 
specific users. I'd like to have SA insert the rule text, eg. 
LOCAL_SOME_RULE so that I can have an exim filter check for a 
specific form of to address plus this rule match before removing the 
message. But at the same time I don't want messages that match this 
rule generate false positives for other users.


Generally 0.01 or -0.01 is adequately small for such purposes.

2. I would like to match against some suspicious URLs that contain 
long sequences of random characters, but only have the rule match if 
I find multiple URLs that follow the same pattern. Normally I would 
use /(some-regex){5}/ but it seems that the rawbody command only 
looks at smaller chunks of the message (in this case the spammer is 
sending messages that are in the 11KB range and I have adjusted exim 
to pass enough in $message_body to capture enough URLs to fire a rule).


Is it possible to configure SA to look at bigger chunks? 8 KB or even 
16 KB would work. If not, is there a way to write a rule that counts 
the total number of matches of a regex against the raw body?


A rule can be allowed to match multiple times, as described in the 
documentation (perldoc Mail::SpamAssassin::Conf.) Here's the example 
provided there:


  uri  __KAM_COUNT_URIS /^./
  tflags   __KAM_COUNT_URIS multiple maxhits=16
  describe __KAM_COUNT_URIS A multiple match used to count 
URIs in a message


  meta __KAM_HAS_0_URIS (__KAM_COUNT_URIS == 0)
  meta __KAM_HAS_1_URIS (__KAM_COUNT_URIS >= 1)
  meta __KAM_HAS_2_URIS (__KAM_COUNT_URIS >= 2)
  meta __KAM_HAS_3_URIS (__KAM_COUNT_URIS >= 3)
  meta __KAM_HAS_4_URIS (__KAM_COUNT_URIS >= 4)
  meta __KAM_HAS_5_URIS (__KAM_COUNT_URIS >= 5)
  meta __KAM_HAS_10_URIS (__KAM_COUNT_URIS >= 10)
  meta __KAM_HAS_15_URIS (__KAM_COUNT_URIS >= 15)






Re: A few noob questions

2020-12-19 Thread Bill Cole

On 19 Dec 2020, at 23:39, Alan wrote:

Please forgive me if these are easy/common questions. I have done some 
searching and haven't found any clear answers.


I'm running SpamAssassin 3.4.4 in a cPanel environment.

1. What is the smallest increment for a rule score? I see some 
indications that it's 0.1, others seem to say it is 0.01. Can I go to 
0.001? Lower?


Any number that Perl understands will work but very small scores are 
pointless.  So if you really want to score a rule at 12.34e-56 you can.


The reason for asking is that I want to use SpamAssassin to flag some 
things that are suspicious but only when other conditions are met for 
specific users. I'd like to have SA insert the rule text, eg. 
LOCAL_SOME_RULE so that I can have an exim filter check for a specific 
form of to address plus this rule match before removing the message. 
But at the same time I don't want messages that match this rule 
generate false positives for other users.


Generally 0.01 or -0.01 is adequately small for such purposes.

2. I would like to match against some suspicious URLs that contain 
long sequences of random characters, but only have the rule match if I 
find multiple URLs that follow the same pattern. Normally I would use 
/(some-regex){5}/ but it seems that the rawbody command only looks at 
smaller chunks of the message (in this case the spammer is sending 
messages that are in the 11KB range and I have adjusted exim to pass 
enough in $message_body to capture enough URLs to fire a rule).


Is it possible to configure SA to look at bigger chunks? 8 KB or even 
16 KB would work. If not, is there a way to write a rule that counts 
the total number of matches of a regex against the raw body?


A rule can be allowed to match multiple times, as described in the 
documentation (perldoc Mail::SpamAssassin::Conf.) Here's the example 
provided there:


  uri  __KAM_COUNT_URIS /^./
  tflags   __KAM_COUNT_URIS multiple maxhits=16
  describe __KAM_COUNT_URIS A multiple match used to count 
URIs in a message


  meta __KAM_HAS_0_URIS (__KAM_COUNT_URIS == 0)
  meta __KAM_HAS_1_URIS (__KAM_COUNT_URIS >= 1)
  meta __KAM_HAS_2_URIS (__KAM_COUNT_URIS >= 2)
  meta __KAM_HAS_3_URIS (__KAM_COUNT_URIS >= 3)
  meta __KAM_HAS_4_URIS (__KAM_COUNT_URIS >= 4)
  meta __KAM_HAS_5_URIS (__KAM_COUNT_URIS >= 5)
  meta __KAM_HAS_10_URIS (__KAM_COUNT_URIS >= 10)
  meta __KAM_HAS_15_URIS (__KAM_COUNT_URIS >= 15)




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


A few noob questions

2020-12-19 Thread Alan
Please forgive me if these are easy/common questions. I have done some 
searching and haven't found any clear answers.


I'm running SpamAssassin 3.4.4 in a cPanel environment.

1. What is the smallest increment for a rule score? I see some 
indications that it's 0.1, others seem to say it is 0.01. Can I go to 
0.001? Lower?


The reason for asking is that I want to use SpamAssassin to flag some 
things that are suspicious but only when other conditions are met for 
specific users. I'd like to have SA insert the rule text, eg. 
LOCAL_SOME_RULE so that I can have an exim filter check for a specific 
form of to address plus this rule match before removing the message. But 
at the same time I don't want messages that match this rule generate 
false positives for other users.


2. I would like to match against some suspicious URLs that contain long 
sequences of random characters, but only have the rule match if I find 
multiple URLs that follow the same pattern. Normally I would use 
/(some-regex){5}/ but it seems that the rawbody command only looks at 
smaller chunks of the message (in this case the spammer is sending 
messages that are in the 11KB range and I have adjusted exim to pass 
enough in $message_body to capture enough URLs to fire a rule).


Is it possible to configure SA to look at bigger chunks? 8 KB or even 16 
KB would work. If not, is there a way to write a rule that counts the 
total number of matches of a regex against the raw body?