Re: Rule HK_SCAM is triggered by standard business email

2020-07-01 Thread Henrik K
On Wed, Jul 01, 2020 at 01:29:51PM -0700, John Hardin wrote:
>
> Agreed, that's why I want Henrik to comment. I don't have the corpus he used
> to develop that rule.

It's really old rules, I don't have either. ;-)

__HK_SCAM_S7 seems to have regressed FP wise, just gonna drop it..



Re: Rule HK_SCAM is triggered by standard business email

2020-07-01 Thread Martin Gregorie
On Wed, 2020-07-01 at 16:20 -0400, Aner Perez wrote:
> It looks like to me like the logic in __HK_SCAM_S7 is a little
> > off...
> > 
> > /(?:(?:investment|proposed|lucrative)
> > (?:business|venture)|(?:business|venture) 
> > (?:enterprise|propos(?:al|ition)))/i
> > 
> > seems like it should be:
> > 
> > /(?:(?:investment|proposed|lucrative)
> > (?:business|venture)|(?:business|venture|enterprise) 
> > propos(?:al|ition))/i
> > 
> 
IME using a meta-rule that ANDs two rules of that type works well. 

The key is to put words or phrases that often occur in spam in each of
the sub-rules, for instance having selling jargon ("lowest prices",
"unbeatable value") in one rule and product names ("flip flops",
"vodka", "power packs") in the other. As a benefit, if the lists are
well-chosen from words and phrases from spam you've received, it will
also hit on sales spam using combinations you've not previously seen
while being surprisingly good at not giving FPs on business or personal
letters.

The only disadvantage is that the subrules get a bit unwieldy and hard
to edit once their definitions get much longer than 80 characters. That
aside, they're easy to understand and maintain.

Martin





Re: Rule HK_SCAM is triggered by standard business email

2020-07-01 Thread John Hardin

On Wed, 1 Jul 2020, Aner Perez wrote:


On 7/1/20 3:52 PM, John Hardin wrote:

On Wed, 1 Jul 2020, Aner Perez wrote:

I opened a bug (7832) about this but was told to report on the SA users 
mailing list instead.


The attached email is an example which triggers the HK_SCAM rule.  Looks 
like __HK_SCAM_S7 is the culprit here since it matches the words 
"business" and "enterprise" when they are found one after the other (even 
on different lines).


In the real world this was triggered by a business email that had the 
following in the signature:


FirstName LastName
Altice Business
Enterprise Account Executive


What was the *overall* score of that message? Was this rule enough to push 
the message over the spam threshold (5 points)? Or was the message still 
scored as ham?


In our case it was marked as spam but only because we have the spam 
threshold set very low (2.4). The message scored a 3.357 when the 
BAYES_50 was added in.


Yeah, that's why doing that blindly is a bad idea. Masscheck sets the base 
rule scores so that spams score 5 points. If you reduce the spam 
threshold, you increase FPs. You need to compensate for that if you do it.



It looks like to me like the logic in __HK_SCAM_S7 is a little off...

/(?:(?:investment|proposed|lucrative) 
(?:business|venture)|(?:business|venture) 
(?:enterprise|propos(?:al|ition)))/i


seems like it should be:

/(?:(?:investment|proposed|lucrative) 
(?:business|venture)|(?:business|venture|enterprise) propos(?:al|ition))/i




That makes more sense but the rule still seems like it would be easily 
triggered by standard business talk (e.g. business proposal).  I guess that's 
the nature of business emails... they're naturally spammy.


Agreed, that's why I want Henrik to comment. I don't have the corpus he 
used to develop that rule.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Of the twenty-two civilizations that have appeared in history,
  nineteen of them collapsed when they reached the moral state the
  United States is in now.  -- Arnold Toynbee
---
 3 days until the 244th anniversary of the Declaration of Independence

Re: Rule HK_SCAM is triggered by standard business email

2020-07-01 Thread Aner Perez

On 7/1/20 3:52 PM, John Hardin wrote:

On Wed, 1 Jul 2020, Aner Perez wrote:

I opened a bug (7832) about this but was told to report on the SA users mailing list 
instead.


The attached email is an example which triggers the HK_SCAM rule.  Looks like 
__HK_SCAM_S7 is the culprit here since it matches the words "business" and "enterprise" 
when they are found one after the other (even on different lines).


In the real world this was triggered by a business email that had the following in the 
signature:


FirstName LastName
Altice Business
Enterprise Account Executive


What was the *overall* score of that message? Was this rule enough to push the message 
over the spam threshold (5 points)? Or was the message still scored as ham?


In our case it was marked as spam but only because we have the spam threshold set very low 
(2.4).  The message scored a 3.357 when the BAYES_50 was added in.




It looks like to me like the logic in __HK_SCAM_S7 is a little off...

/(?:(?:investment|proposed|lucrative) (?:business|venture)|(?:business|venture) 
(?:enterprise|propos(?:al|ition)))/i


seems like it should be:

/(?:(?:investment|proposed|lucrative) (?:business|venture)|(?:business|venture|enterprise) 
propos(?:al|ition))/i




That makes more sense but the rule still seems like it would be easily triggered by 
standard business talk (e.g. business proposal).  I guess that's the nature of business 
emails... they're naturally spammy.



...but I'll let Henrik comment.


Potentially, making it a rawbody rule might avoid this FP without affecting its 
performance against the targeted spams...



For future reference: sending a sample email to the list as a bare attachment is 
problematic, as it may be altered en-route and thus invalidate any meaningful analysis. 
It's better to attach it as a zip/gzip, or to upload it to someplace like Pastebin and 
just post the URL to it here. (In this case, your description should probably be enough to 
figure it out without the sample so you shouldn't need to do that unless someone 
explicitly asks you to do so.)




Thanks I'll keep that in mind.

- Aner


Re: Rule HK_SCAM is triggered by standard business email

2020-07-01 Thread John Hardin

On Wed, 1 Jul 2020, Aner Perez wrote:

I opened a bug (7832) about this but was told to report on the SA users 
mailing list instead.


The attached email is an example which triggers the HK_SCAM rule.  Looks like 
__HK_SCAM_S7 is the culprit here since it matches the words "business" and 
"enterprise" when they are found one after the other (even on different 
lines).


In the real world this was triggered by a business email that had the 
following in the signature:


FirstName LastName
Altice Business
Enterprise Account Executive


What was the *overall* score of that message? Was this rule enough to push 
the message over the spam threshold (5 points)? Or was the message still 
scored as ham?


It looks like to me like the logic in __HK_SCAM_S7 is a little off...

/(?:(?:investment|proposed|lucrative) (?:business|venture)|(?:business|venture) 
(?:enterprise|propos(?:al|ition)))/i

seems like it should be:

/(?:(?:investment|proposed|lucrative) 
(?:business|venture)|(?:business|venture|enterprise) propos(?:al|ition))/i

...but I'll let Henrik comment.


Potentially, making it a rawbody rule might avoid this FP without 
affecting its performance against the targeted spams...



For future reference: sending a sample email to the list as a bare 
attachment is problematic, as it may be altered en-route and thus 
invalidate any meaningful analysis. It's better to attach it as a 
zip/gzip, or to upload it to someplace like Pastebin and just post the URL 
to it here. (In this case, your description should probably be enough to 
figure it out without the sample so you shouldn't need to do that unless 
someone explicitly asks you to do so.)




--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The philosophy of gun control: Teenagers are roaring through
  town at 90MPH, where the speed limit is 25. Your solution is to
  lower the speed limit to 20.   -- Sam Cohen
---
 3 days until the 244th anniversary of the Declaration of Independence


Rule HK_SCAM is triggered by standard business email

2020-07-01 Thread Aner Perez

I opened a bug (7832) about this but was told to report on the SA users mailing 
list instead.

The attached email is an example which triggers the HK_SCAM rule.  Looks like __HK_SCAM_S7 
is the culprit here since it matches the words "business" and "enterprise" when they are 
found one after the other (even on different lines).


In the real world this was triggered by a business email that had the following in the 
signature:


FirstName LastName
Altice Business
Enterprise Account Executive

- Aner
--- Begin Message ---

Let's list some

Business
Enterprise

Sounds simple
--- End Message ---


Re: Frequency of SUSP_NTLD updates

2020-07-01 Thread John Hardin

On Wed, 1 Jul 2020, @lbutlr wrote:


On 30 Jun 2020, at 09:31, RW  wrote:

On Tue, 30 Jun 2020 11:30:17 +
Roald Stolte wrote:


These mails were all using TLDs such as .site and .online and were
getting marked because of it.


Are others seeing a decrease in spam from .site and .online? All I see 
from these TLD is 100% spam. They are not at the volume that .top was 
when this free-for all on TLDs started, but they are not generating any 
legitimate mail on my servers.


That matches my experience.


You could just drop the score for FROM_SUSPICIOUS_NTLD &
FROM_SUSPICIOUS_NTLD_FP.


This is probably the best way, but I'd be wary of dropping it too much.


Especially as the rule covers *other* rarely-legit TLDs as well, and that 
would impact their scoring.


I'd suggest instead a rule with an offsetting negative score (not 
necessarily an actual whitelist/accept entry as that's *too* generous) for 
the TLDs (or if possible the specific domains in those TLDs) that are 
causing problems.


I realize this isn't really a welcome solution per the original note but 
until the legitimate use of those TLDs grows the rules punishing them do 
have value.



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Microsoft is not a standards body.
---
 3 days until the 244th anniversary of the Declaration of Independence


Re: Frequency of SUSP_NTLD updates

2020-07-01 Thread @lbutlr
On 30 Jun 2020, at 09:31, RW  wrote:
> On Tue, 30 Jun 2020 11:30:17 +
> Roald Stolte wrote:
> 
> 
>> These mails were all using TLDs such as .site and .online and were
>> getting marked because of it.

Are others seeing a decrease in spam from .site and .online? All I see from 
these TLD is 100% spam. They are not at the volume that .top was when this 
free-for all on TLDs started, but they are not generating any legitimate mail 
on my servers. I've loosened some restrictions on .fm tv and ,info, since there 
are legitimate senders there, but even those are still mostly spam.

I see connections from domains like server.creativecabin.online, 
mail.mobile-advertising.site, mail.freebitcoins.site, dand 
fame.servetxt.online, and most of it is coming in to spam-trap email addresses.

> You could just drop the score for FROM_SUSPICIOUS_NTLD &
> FROM_SUSPICIOUS_NTLD_FP.

This is probably the best way, but I'd be wary of dropping it too much.



-- 
Good old Dame Fortune. You can _depend_ on her.



Re: Detection rate of msbl.org

2020-07-01 Thread RW
On Wed, 1 Jul 2020 10:49:03 +0200
Marc Roos wrote:


> Jul  1 01:08:45 spam1 sendmail[19193]: 05UN8fHL019193: Milter: 
> from=, reject=550 5.7.1 Rejected 
> feedb...@service.alibaba.com SPAM (ebl.msbl.org) 

I don't know what this is, but I guess it's not a purely SA based milter
as it gives a single reason for rejection.

Most of the hits on EBL that I get with SA are from addresses parsed out
of the body - often from HTML. If your milter can't do that you wont get
good results.

EBL is most effective against a subset of difficult spam where other
types of list don't work. It should really be judged on how it effects
what would otherwise would get past content filtering, not on what it
prevents reaching content filtering.



RE: Detection rate of msbl.org

2020-07-01 Thread Marc Roos


Not much yet, I got this one[1]. But I am having this check as one of 
the last. Most connections are already failing with 'Possibly forged 
hostname'

[1]
Jul  1 01:08:45 spam1 sendmail[19193]: 05UN8fHL019193: Milter: 
from=, reject=550 5.7.1 Rejected 
feedb...@service.alibaba.com SPAM (ebl.msbl.org) 





-Original Message-
From: James Brown [mailto:jlbr...@bordo.com.au] 
Sent: maandag 22 juni 2020 16:07
To: users@spamassassin.apache.org
Subject: Detection rate of msbl.org

I’m thinking about using the EBL from msbl.org with SA.

Can anyone tell me what detection rate they are getting with it? Is it 
worth using, or would the spam be trapped by other methods (RBL, etc) 
anyway?

Pretty hard to find much information about how useful it is.

Thanks,

James.