Re: Rule HK_SCAM is triggered by standard business email

2020-07-01 Thread Aner Perez

On 7/1/20 3:52 PM, John Hardin wrote:

On Wed, 1 Jul 2020, Aner Perez wrote:

I opened a bug (7832) about this but was told to report on the SA users mailing list 
instead.


The attached email is an example which triggers the HK_SCAM rule.  Looks like 
__HK_SCAM_S7 is the culprit here since it matches the words "business" and "enterprise" 
when they are found one after the other (even on different lines).


In the real world this was triggered by a business email that had the following in the 
signature:


FirstName LastName
Altice Business
Enterprise Account Executive


What was the *overall* score of that message? Was this rule enough to push the message 
over the spam threshold (5 points)? Or was the message still scored as ham?


In our case it was marked as spam but only because we have the spam threshold set very low 
(2.4).  The message scored a 3.357 when the BAYES_50 was added in.




It looks like to me like the logic in __HK_SCAM_S7 is a little off...

/(?:(?:investment|proposed|lucrative) (?:business|venture)|(?:business|venture) 
(?:enterprise|propos(?:al|ition)))/i


seems like it should be:

/(?:(?:investment|proposed|lucrative) (?:business|venture)|(?:business|venture|enterprise) 
propos(?:al|ition))/i




That makes more sense but the rule still seems like it would be easily triggered by 
standard business talk (e.g. business proposal).  I guess that's the nature of business 
emails... they're naturally spammy.



...but I'll let Henrik comment.


Potentially, making it a rawbody rule might avoid this FP without affecting its 
performance against the targeted spams...



For future reference: sending a sample email to the list as a bare attachment is 
problematic, as it may be altered en-route and thus invalidate any meaningful analysis. 
It's better to attach it as a zip/gzip, or to upload it to someplace like Pastebin and 
just post the URL to it here. (In this case, your description should probably be enough to 
figure it out without the sample so you shouldn't need to do that unless someone 
explicitly asks you to do so.)




Thanks I'll keep that in mind.

- Aner


Rule HK_SCAM is triggered by standard business email

2020-07-01 Thread Aner Perez

I opened a bug (7832) about this but was told to report on the SA users mailing 
list instead.

The attached email is an example which triggers the HK_SCAM rule.  Looks like __HK_SCAM_S7 
is the culprit here since it matches the words "business" and "enterprise" when they are 
found one after the other (even on different lines).


In the real world this was triggered by a business email that had the following in the 
signature:


FirstName LastName
Altice Business
Enterprise Account Executive

- Aner
--- Begin Message ---

Let's list some

Business
Enterprise

Sounds simple
--- End Message ---