Re: (Re-)emergence of UTF based obfuscation in phishing/spam

2023-08-30 Thread Ricky Boone
Typo, I meant to say I was on SA 3.4.6.

On Wed, Aug 30, 2023, 3:22 PM Ricky Boone  wrote:

> Something I noticed on a set of emails that were reported to me.
>
> I have custom rules to look out for certain names in From:name.  The
> messages should have been caught by them, however upon inspection the
> name was UTF-8 encoded, and included a character that doesn't seem to
> render, but interferes with the regex I used.  Specifically, the bad
> actor included a RIGHT-TO-LEFT mark (U+200F, or \xe2\x80\x8f)
> effectively as a null-space character.  The body of the message was
> also flooded with LEFT-TO-RIGHT (U+200E, or \xe2\x80\x8e) and ZERO
> WIDTH NO-BREAK SPACE (U+FEFF, or \xef\xbb\xbf) characters randomly
> placed within the body and within words to interfere with other rules.
> When debugging the message, it doesn't appear that the characters are
> normalized, so from SA's perspective it seems like all of these
> characters have to be accounted for with any rules.
>
> To add, I'm currently on SA 3.6.x.  It looks like 4.0 improves UTF-8
> handling, but I'm not sure if it would address the behavior I see
> (though happy to be wrong... albeit not able to update immediately).
>
> I'm trying to see if ReplaceTags might be useful, and found an older
> discussion in this list on the matter related to the trouble with
> UTF-8.  I checked to see if there were any existing tags that would
> account for null-space/zero-width space-like characters, but didn't
> see any.  I have no issues working on creating a tag, but wanted to
> gauge the community to see what their thoughts were while I started
> down that path.
>


Re: Scoring Explanation Please

2023-08-30 Thread Bill Cole
On 2023-08-30 at 15:14:15 UTC-0400 (Wed, 30 Aug 2023 19:14:15 + 
(UTC))

Denny Jones via users 
is rumored to have said:


Hello,
I have looked high and low and can't find an explanation for 
multi-level scoring:

score SCC_CANSPAM_2    3.799    0.001    3.799    0.00
What does this mean?
In my simplistic way of doing things I would write this as:
score SCC_CANSPAM_2 3.799


Try running this:

perldoc Mail::SpamAssassin::Conf


That provides you with a man-like interface for the configuration of 
SpamAssassin, extracted from the Mail::SpamAssassin::Conf perl module. 
Not very far into that document you will find:


If four valid scores are listed, then the score that is used 
depends
on how SpamAssassin is being used. The first score is used when 
both
Bayes and network tests are disabled (score set 0). The second 
score
is used when Bayes is disabled, but network tests are enabled 
(score
set 1). The third score is used when Bayes is enabled and 
network

tests are disabled (score set 2). The fourth score is used when
Bayes is enabled and network tests are enabled (score set 3).

Very often, you will find the the automated rescoring system will emit 
what looks like a perverse set of scores with the 2 network-enabled 
scores at or near zero. That is an artifact of how rescoring is done 
combined with the fact that network tests are often a distillation of 
other people's recent spam detections.  Essentially a very 'small' 
rule is duplicative of the detection being effectively done by a network 
source.






--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Scoring Explanation Please

2023-08-30 Thread David B Funk

Denny,

If you read the fine manual for the spamassassin configuration file, in section 
for 'score SYMBOLIC_TEST_NAME n.nn [ n.nn n.nn n.nn ]'


You'll see:

   If only one valid score is listed, then that score is always used for a test.

   If four valid scores are listed, then the score that is used depends on how 
SpamAssassin is being used. The first score is used when both Bayes and network 
tests are disabled (score set 0). The second score is used when Bayes is 
disabled, but network tests are enabled (score set 1). The third score is used 
when Bayes is enabled and network tests are disabled (score set 2). The fourth 
score is used when Bayes is enabled and network tests are enabled (score set 3).


So when there are four score values it will use the one relevant to your SA's 
operating condition.


EG: if the rule is senstive to the presence of network type tests, such as 
DNSRBLs, the score can be adjusted accordingly.



On Wed, 30 Aug 2023, Denny Jones via users wrote:


Hello,

I have looked high and low and can't find an explanation for multi-level 
scoring:

score SCC_CANSPAM_2    3.799    0.001    3.799    0.00

What does this mean?

In my simplistic way of doing things I would write this as:

score SCC_CANSPAM_2 3.799

Thanks for helping clear the mud in my mind!

Denny






--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{

(Re-)emergence of UTF based obfuscation in phishing/spam

2023-08-30 Thread Ricky Boone
Something I noticed on a set of emails that were reported to me.

I have custom rules to look out for certain names in From:name.  The
messages should have been caught by them, however upon inspection the
name was UTF-8 encoded, and included a character that doesn't seem to
render, but interferes with the regex I used.  Specifically, the bad
actor included a RIGHT-TO-LEFT mark (U+200F, or \xe2\x80\x8f)
effectively as a null-space character.  The body of the message was
also flooded with LEFT-TO-RIGHT (U+200E, or \xe2\x80\x8e) and ZERO
WIDTH NO-BREAK SPACE (U+FEFF, or \xef\xbb\xbf) characters randomly
placed within the body and within words to interfere with other rules.
When debugging the message, it doesn't appear that the characters are
normalized, so from SA's perspective it seems like all of these
characters have to be accounted for with any rules.

To add, I'm currently on SA 3.6.x.  It looks like 4.0 improves UTF-8
handling, but I'm not sure if it would address the behavior I see
(though happy to be wrong... albeit not able to update immediately).

I'm trying to see if ReplaceTags might be useful, and found an older
discussion in this list on the matter related to the trouble with
UTF-8.  I checked to see if there were any existing tags that would
account for null-space/zero-width space-like characters, but didn't
see any.  I have no issues working on creating a tag, but wanted to
gauge the community to see what their thoughts were while I started
down that path.


Scoring Explanation Please

2023-08-30 Thread Denny Jones via users
Hello,
I have looked high and low and can't find an explanation for multi-level 
scoring:
score SCC_CANSPAM_2    3.799    0.001    3.799    0.00
What does this mean?
In my simplistic way of doing things I would write this as:
score SCC_CANSPAM_2 3.799

Thanks for helping clear the mud in my mind!
Denny