Re: A few noob questions
On 20 Dec 2020, at 0:38, Alan wrote: Thanks Bill. I know very little about Perl, so while I saw the reference to Mail::SpamAssassin::Conf without the "perldoc" in front of it, I had no clue what to do with that information. Sorry about that. The same info is at https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html On 2020-12-20 00:18, Bill Cole wrote: On 19 Dec 2020, at 23:39, Alan wrote: Please forgive me if these are easy/common questions. I have done some searching and haven't found any clear answers. I'm running SpamAssassin 3.4.4 in a cPanel environment. 1. What is the smallest increment for a rule score? I see some indications that it's 0.1, others seem to say it is 0.01. Can I go to 0.001? Lower? Any number that Perl understands will work but very small scores are pointless. So if you really want to score a rule at 12.34e-56 you can. The reason for asking is that I want to use SpamAssassin to flag some things that are suspicious but only when other conditions are met for specific users. I'd like to have SA insert the rule text, eg. LOCAL_SOME_RULE so that I can have an exim filter check for a specific form of to address plus this rule match before removing the message. But at the same time I don't want messages that match this rule generate false positives for other users. Generally 0.01 or -0.01 is adequately small for such purposes. 2. I would like to match against some suspicious URLs that contain long sequences of random characters, but only have the rule match if I find multiple URLs that follow the same pattern. Normally I would use /(some-regex){5}/ but it seems that the rawbody command only looks at smaller chunks of the message (in this case the spammer is sending messages that are in the 11KB range and I have adjusted exim to pass enough in $message_body to capture enough URLs to fire a rule). Is it possible to configure SA to look at bigger chunks? 8 KB or even 16 KB would work. If not, is there a way to write a rule that counts the total number of matches of a regex against the raw body? A rule can be allowed to match multiple times, as described in the documentation (perldoc Mail::SpamAssassin::Conf.) Here's the example provided there: uri __KAM_COUNT_URIS /^./ tflags __KAM_COUNT_URIS multiple maxhits=16 describe __KAM_COUNT_URIS A multiple match used to count URIs in a message meta __KAM_HAS_0_URIS (__KAM_COUNT_URIS == 0) meta __KAM_HAS_1_URIS (__KAM_COUNT_URIS >= 1) meta __KAM_HAS_2_URIS (__KAM_COUNT_URIS >= 2) meta __KAM_HAS_3_URIS (__KAM_COUNT_URIS >= 3) meta __KAM_HAS_4_URIS (__KAM_COUNT_URIS >= 4) meta __KAM_HAS_5_URIS (__KAM_COUNT_URIS >= 5) meta __KAM_HAS_10_URIS (__KAM_COUNT_URIS >= 10) meta __KAM_HAS_15_URIS (__KAM_COUNT_URIS >= 15) -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
Re: A few noob questions
On 2020-12-20 21:11, John Hardin wrote: On Sun, 20 Dec 2020, Alan wrote: n.b.: you're not subscribed to the list from netbeans.5zc...@ambitonline.com but I pushed it through moderation. If you're going to post regularly from that address you should register it as an alternate. Oh nuts. I always set up a forwarder per list with random suffix, just so that if it ever leaks out I can change the suffix and beat the harvesters. I picked the wrong identity to send from. Guess my Netbeans address now needs an update. Self-inflicted wounds. :( I do a lot of rule dev so I have a dedicated test environment. I can't say whether --cf would work, I've never tried it. Seems plausible. You'll also want "--debug area=all,rules,rules-all,message,uri" to see the hits in the log output. Perfect. Thanks!
Re: A few noob questions
On Sun, 20 Dec 2020, Alan wrote: n.b.: you're not subscribed to the list from netbeans.5zc...@ambitonline.com but I pushed it through moderation. If you're going to post regularly from that address you should register it as an alternate. From the mailing list help: You can start a subscription for an alternate address, for example "john@host.domain", just add a hyphen and your address (with '=' instead of '@') after the command word: Many thanks for your help. On 2020-12-20 15:26, John Hardin wrote: On Sat, 19 Dec 2020, Alan wrote: The reason for asking is that I want to use SpamAssassin to flag some things that are suspicious but only when other conditions are met for specific users. I'd like to have SA insert the rule text, eg. LOCAL_SOME_RULE so that I can have an exim filter check for a specific form of to address plus this rule match before removing the message. You should be able to do that purely in SA; it's a tad more difficult if you want to match the envelope to address rather than the To: header. If you want to reliably match the envelope to address you'd need to have it recorded in a Received header (either the one that your MTA generates or the one that some trusted MTA prior to your MTA generates). Agreed, ideally this is something I can stick into a KB article and have afflicted users implement on their own. I'd like to keep system-wide modifications to a minimum. A user's exim filters also move when we transfer an account to another server, so as long as there's a common rule set, not having to adjust SA configuration is a benefit. Ah, ok. That makes sense. Basically what I have now is this: uri __LCL_SUSPECT_LINK1 /target_pattern_1/i tflags __LCL_SUSPECT_LINK1 multiple maxhits=5 uri __LCL_SUSPECT_LINK2 /target_pattern_2/i tflags __LCL_SUSPECT_LINK2 multiple maxhits=5 meta LCL_MANY_SUSPECT_LINKS __LCL_SUSPECT_LINK1 && __LCL_SUSPECT_LINK2 && rules_matching(__LCL_SUSPECT_LINK?) > 5 No, it doesn't need to be that complex. This is all you need: meta LCL_MANY_SUSPECT_LINKS __LCL_SUSPECT_LINK1 > 4 && __LCL_SUSPECT_LINK2 > 4 Treat the rule names as variables having their value = # hits. Mostly you're doing logical comparisons (R1 && R2 && !R3) but math is totally acceptable as well, e.g. (R1 + R2 + R3 > 1) for an "any two out of three" meta rule. ...so, if you want to count multiple hits across several rules, perhaps: meta LCL_MANY_SUSPECT_LINKS (__LCL_SUSPECT_LINK1 + __LCL_SUSPECT_LINK2) > 4 Also note that with "maxhits=5" the number of times the rule will hit will be at most 5, so "> 5" will never match. One more noob question. Can I test a rule without messing with the production environment by using spamassassin -t -cf='include myrule.cf' path or should I build a test environment? I do a lot of rule dev so I have a dedicated test environment. I can't say whether --cf would work, I've never tried it. Seems plausible. You'll also want "--debug area=all,rules,rules-all,message,uri" to see the hits in the log output. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never does quite what I want. I wish Christopher Robin was here." -- Peter da Silva in a.s.r --- 5 days until Christmas
Re: A few noob questions
Many thanks for your help. On 2020-12-20 15:26, John Hardin wrote: On Sat, 19 Dec 2020, Alan wrote: The reason for asking is that I want to use SpamAssassin to flag some things that are suspicious but only when other conditions are met for specific users. I'd like to have SA insert the rule text, eg. LOCAL_SOME_RULE so that I can have an exim filter check for a specific form of to address plus this rule match before removing the message. You should be able to do that purely in SA; it's a tad more difficult if you want to match the envelope to address rather than the To: header. If you want to reliably match the envelope to address you'd need to have it recorded in a Received header (either the one that your MTA generates or the one that some trusted MTA prior to your MTA generates). Agreed, ideally this is something I can stick into a KB article and have afflicted users implement on their own. I'd like to keep system-wide modifications to a minimum. A user's exim filters also move when we transfer an account to another server, so as long as there's a common rule set, not having to adjust SA configuration is a benefit. Basically what I have now is this: uri __LCL_SUSPECT_LINK1 /target_pattern_1/i tflags __LCL_SUSPECT_LINK1 multiple maxhits=5 uri __LCL_SUSPECT_LINK2 /target_pattern_2/i tflags __LCL_SUSPECT_LINK2 multiple maxhits=5 meta LCL_MANY_SUSPECT_LINKS __LCL_SUSPECT_LINK1 && __LCL_SUSPECT_LINK2 && rules_matching(__LCL_SUSPECT_LINK?) > 5 score LCL_MANY_SUSPECT_LINKS 0.001 describe LCL_MANY_SUSPECT_LINKS More than 5 links match a suspected spam pattern As for long sequences of random characters - that's FP-prone. It's difficult to detect *random* in a simple RE. A long string of characters from a given set, easy. Characteristics about that string? complicated. A rule like that might potentially hit on legitimate (for values of "legitimate") tracking analysis URIs or caching URIs, unless there is some kind of uncommon pattern to it that you can discern and look for in the RE. No kidding. I've seen this specific pattern in many a spam message over the years so I suspect it's particularly FP vulnerable. If there was a regex rule for "matches English word" I could nail them with ease. OTOH my regex skills are pretty decent. Finding the two common patterns and checking that at least one of each is there will hopefully eliminate messages that consistently only use one form, eliminating a range of FPs. If I can use the "many suspect links" match along with a few other indicators, including that this particular [expletive] makes the message look like it comes from a mailing list, I think I can kill their spew. I'm seeing upwards of 20 messages per day per user from this source, but they're rotating through junk data center IP addresses and disposable mail server identities daily. This is war. One more noob question. Can I test a rule without messing with the production environment by using spamassassin -t -cf='include myrule.cf' path or should I build a test environment?
Re: A few noob questions
On Sat, 19 Dec 2020, Alan wrote: 1. What is the smallest increment for a rule score? I see some indications that it's 0.1, others seem to say it is 0.01. Can I go to 0.001? Lower? As Bill said, anything works. Zero does disable the rule; a score of 0.001 is generally termed "informative" - you want to include it in the hits output so that you know that the rule hits, but you don't want it (by itself) to affect the score. See, for example, LOTSA_MONEY. The reason for asking is that I want to use SpamAssassin to flag some things that are suspicious but only when other conditions are met for specific users. I'd like to have SA insert the rule text, eg. LOCAL_SOME_RULE so that I can have an exim filter check for a specific form of to address plus this rule match before removing the message. You should be able to do that purely in SA; it's a tad more difficult if you want to match the envelope to address rather than the To: header. If you want to reliably match the envelope to address you'd need to have it recorded in a Received header (either the one that your MTA generates or the one that some trusted MTA prior to your MTA generates). You'd make LOCAL_SOME_RULE an unscored subrule by prepending two underscores: __LCL_SOME_RULE, and then you'd develop some subrule(s) to hit on the specific form of to address(es) you're interested in. Then these can be combined in a scored meta rule: meta LCL_POISON_01 __LCL_SOME_RULE && (__LCL_SUSP_TO_01 || __LCL_SUSP_TO_02) score LCL_POISON_01 10.000 But at the same time I don't want messages that match this rule generate false positives for other users. If you've done the __LCL_SUSP_TO_* rule(s) properly that shouldn't happen. You can set the score to informative while testing it. 2. I would like to match against some suspicious URLs that contain long sequences of random characters, but only have the rule match if I find multiple URLs that follow the same pattern. Bill answered that adequately. One comment on his answer: describe __KAM_COUNT_URIS Subrules never appear in the hits output so a description on them is only for internal documentation purposes; a regular #comment would work just as well for that. As for long sequences of random characters - that's FP-prone. It's difficult to detect *random* in a simple RE. A long string of characters from a given set, easy. Characteristics about that string? complicated. A rule like that might potentially hit on legitimate (for values of "legitimate") tracking analysis URIs or caching URIs, unless there is some kind of uncommon pattern to it that you can discern and look for in the RE. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- "Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never does quite what I want. I wish Christopher Robin was here." -- Peter da Silva in a.s.r --- 5 days until Christmas
Re: A few noob questions
Thanks Bill. I know very little about Perl, so while I saw the reference to Mail::SpamAssassin::Conf without the "perldoc" in front of it, I had no clue what to do with that information. On 2020-12-20 00:18, Bill Cole wrote: On 19 Dec 2020, at 23:39, Alan wrote: Please forgive me if these are easy/common questions. I have done some searching and haven't found any clear answers. I'm running SpamAssassin 3.4.4 in a cPanel environment. 1. What is the smallest increment for a rule score? I see some indications that it's 0.1, others seem to say it is 0.01. Can I go to 0.001? Lower? Any number that Perl understands will work but very small scores are pointless. So if you really want to score a rule at 12.34e-56 you can. The reason for asking is that I want to use SpamAssassin to flag some things that are suspicious but only when other conditions are met for specific users. I'd like to have SA insert the rule text, eg. LOCAL_SOME_RULE so that I can have an exim filter check for a specific form of to address plus this rule match before removing the message. But at the same time I don't want messages that match this rule generate false positives for other users. Generally 0.01 or -0.01 is adequately small for such purposes. 2. I would like to match against some suspicious URLs that contain long sequences of random characters, but only have the rule match if I find multiple URLs that follow the same pattern. Normally I would use /(some-regex){5}/ but it seems that the rawbody command only looks at smaller chunks of the message (in this case the spammer is sending messages that are in the 11KB range and I have adjusted exim to pass enough in $message_body to capture enough URLs to fire a rule). Is it possible to configure SA to look at bigger chunks? 8 KB or even 16 KB would work. If not, is there a way to write a rule that counts the total number of matches of a regex against the raw body? A rule can be allowed to match multiple times, as described in the documentation (perldoc Mail::SpamAssassin::Conf.) Here's the example provided there: uri __KAM_COUNT_URIS /^./ tflags __KAM_COUNT_URIS multiple maxhits=16 describe __KAM_COUNT_URIS A multiple match used to count URIs in a message meta __KAM_HAS_0_URIS (__KAM_COUNT_URIS == 0) meta __KAM_HAS_1_URIS (__KAM_COUNT_URIS >= 1) meta __KAM_HAS_2_URIS (__KAM_COUNT_URIS >= 2) meta __KAM_HAS_3_URIS (__KAM_COUNT_URIS >= 3) meta __KAM_HAS_4_URIS (__KAM_COUNT_URIS >= 4) meta __KAM_HAS_5_URIS (__KAM_COUNT_URIS >= 5) meta __KAM_HAS_10_URIS (__KAM_COUNT_URIS >= 10) meta __KAM_HAS_15_URIS (__KAM_COUNT_URIS >= 15)
Re: A few noob questions
On 19 Dec 2020, at 23:39, Alan wrote: Please forgive me if these are easy/common questions. I have done some searching and haven't found any clear answers. I'm running SpamAssassin 3.4.4 in a cPanel environment. 1. What is the smallest increment for a rule score? I see some indications that it's 0.1, others seem to say it is 0.01. Can I go to 0.001? Lower? Any number that Perl understands will work but very small scores are pointless. So if you really want to score a rule at 12.34e-56 you can. The reason for asking is that I want to use SpamAssassin to flag some things that are suspicious but only when other conditions are met for specific users. I'd like to have SA insert the rule text, eg. LOCAL_SOME_RULE so that I can have an exim filter check for a specific form of to address plus this rule match before removing the message. But at the same time I don't want messages that match this rule generate false positives for other users. Generally 0.01 or -0.01 is adequately small for such purposes. 2. I would like to match against some suspicious URLs that contain long sequences of random characters, but only have the rule match if I find multiple URLs that follow the same pattern. Normally I would use /(some-regex){5}/ but it seems that the rawbody command only looks at smaller chunks of the message (in this case the spammer is sending messages that are in the 11KB range and I have adjusted exim to pass enough in $message_body to capture enough URLs to fire a rule). Is it possible to configure SA to look at bigger chunks? 8 KB or even 16 KB would work. If not, is there a way to write a rule that counts the total number of matches of a regex against the raw body? A rule can be allowed to match multiple times, as described in the documentation (perldoc Mail::SpamAssassin::Conf.) Here's the example provided there: uri __KAM_COUNT_URIS /^./ tflags __KAM_COUNT_URIS multiple maxhits=16 describe __KAM_COUNT_URIS A multiple match used to count URIs in a message meta __KAM_HAS_0_URIS (__KAM_COUNT_URIS == 0) meta __KAM_HAS_1_URIS (__KAM_COUNT_URIS >= 1) meta __KAM_HAS_2_URIS (__KAM_COUNT_URIS >= 2) meta __KAM_HAS_3_URIS (__KAM_COUNT_URIS >= 3) meta __KAM_HAS_4_URIS (__KAM_COUNT_URIS >= 4) meta __KAM_HAS_5_URIS (__KAM_COUNT_URIS >= 5) meta __KAM_HAS_10_URIS (__KAM_COUNT_URIS >= 10) meta __KAM_HAS_15_URIS (__KAM_COUNT_URIS >= 15) -- Bill Cole b...@scconsult.com or billc...@apache.org (AKA @grumpybozo and many *@billmail.scconsult.com addresses) Not Currently Available For Hire
A few noob questions
Please forgive me if these are easy/common questions. I have done some searching and haven't found any clear answers. I'm running SpamAssassin 3.4.4 in a cPanel environment. 1. What is the smallest increment for a rule score? I see some indications that it's 0.1, others seem to say it is 0.01. Can I go to 0.001? Lower? The reason for asking is that I want to use SpamAssassin to flag some things that are suspicious but only when other conditions are met for specific users. I'd like to have SA insert the rule text, eg. LOCAL_SOME_RULE so that I can have an exim filter check for a specific form of to address plus this rule match before removing the message. But at the same time I don't want messages that match this rule generate false positives for other users. 2. I would like to match against some suspicious URLs that contain long sequences of random characters, but only have the rule match if I find multiple URLs that follow the same pattern. Normally I would use /(some-regex){5}/ but it seems that the rawbody command only looks at smaller chunks of the message (in this case the spammer is sending messages that are in the 11KB range and I have adjusted exim to pass enough in $message_body to capture enough URLs to fire a rule). Is it possible to configure SA to look at bigger chunks? 8 KB or even 16 KB would work. If not, is there a way to write a rule that counts the total number of matches of a regex against the raw body?