Re: Rule HK_SCAM is triggered by standard business email
On Wed, Jul 01, 2020 at 01:29:51PM -0700, John Hardin wrote: > > Agreed, that's why I want Henrik to comment. I don't have the corpus he used > to develop that rule. It's really old rules, I don't have either. ;-) __HK_SCAM_S7 seems to have regressed FP wise, just gonna drop it..
Re: Rule HK_SCAM is triggered by standard business email
On Wed, 2020-07-01 at 16:20 -0400, Aner Perez wrote: > It looks like to me like the logic in __HK_SCAM_S7 is a little > > off... > > > > /(?:(?:investment|proposed|lucrative) > > (?:business|venture)|(?:business|venture) > > (?:enterprise|propos(?:al|ition)))/i > > > > seems like it should be: > > > > /(?:(?:investment|proposed|lucrative) > > (?:business|venture)|(?:business|venture|enterprise) > > propos(?:al|ition))/i > > > IME using a meta-rule that ANDs two rules of that type works well. The key is to put words or phrases that often occur in spam in each of the sub-rules, for instance having selling jargon ("lowest prices", "unbeatable value") in one rule and product names ("flip flops", "vodka", "power packs") in the other. As a benefit, if the lists are well-chosen from words and phrases from spam you've received, it will also hit on sales spam using combinations you've not previously seen while being surprisingly good at not giving FPs on business or personal letters. The only disadvantage is that the subrules get a bit unwieldy and hard to edit once their definitions get much longer than 80 characters. That aside, they're easy to understand and maintain. Martin
Re: Rule HK_SCAM is triggered by standard business email
On Wed, 1 Jul 2020, Aner Perez wrote: On 7/1/20 3:52 PM, John Hardin wrote: On Wed, 1 Jul 2020, Aner Perez wrote: I opened a bug (7832) about this but was told to report on the SA users mailing list instead. The attached email is an example which triggers the HK_SCAM rule. Looks like __HK_SCAM_S7 is the culprit here since it matches the words "business" and "enterprise" when they are found one after the other (even on different lines). In the real world this was triggered by a business email that had the following in the signature: FirstName LastName Altice Business Enterprise Account Executive What was the *overall* score of that message? Was this rule enough to push the message over the spam threshold (5 points)? Or was the message still scored as ham? In our case it was marked as spam but only because we have the spam threshold set very low (2.4). The message scored a 3.357 when the BAYES_50 was added in. Yeah, that's why doing that blindly is a bad idea. Masscheck sets the base rule scores so that spams score 5 points. If you reduce the spam threshold, you increase FPs. You need to compensate for that if you do it. It looks like to me like the logic in __HK_SCAM_S7 is a little off... /(?:(?:investment|proposed|lucrative) (?:business|venture)|(?:business|venture) (?:enterprise|propos(?:al|ition)))/i seems like it should be: /(?:(?:investment|proposed|lucrative) (?:business|venture)|(?:business|venture|enterprise) propos(?:al|ition))/i That makes more sense but the rule still seems like it would be easily triggered by standard business talk (e.g. business proposal). I guess that's the nature of business emails... they're naturally spammy. Agreed, that's why I want Henrik to comment. I don't have the corpus he used to develop that rule. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Of the twenty-two civilizations that have appeared in history, nineteen of them collapsed when they reached the moral state the United States is in now. -- Arnold Toynbee --- 3 days until the 244th anniversary of the Declaration of Independence
Re: Rule HK_SCAM is triggered by standard business email
On 7/1/20 3:52 PM, John Hardin wrote: On Wed, 1 Jul 2020, Aner Perez wrote: I opened a bug (7832) about this but was told to report on the SA users mailing list instead. The attached email is an example which triggers the HK_SCAM rule. Looks like __HK_SCAM_S7 is the culprit here since it matches the words "business" and "enterprise" when they are found one after the other (even on different lines). In the real world this was triggered by a business email that had the following in the signature: FirstName LastName Altice Business Enterprise Account Executive What was the *overall* score of that message? Was this rule enough to push the message over the spam threshold (5 points)? Or was the message still scored as ham? In our case it was marked as spam but only because we have the spam threshold set very low (2.4). The message scored a 3.357 when the BAYES_50 was added in. It looks like to me like the logic in __HK_SCAM_S7 is a little off... /(?:(?:investment|proposed|lucrative) (?:business|venture)|(?:business|venture) (?:enterprise|propos(?:al|ition)))/i seems like it should be: /(?:(?:investment|proposed|lucrative) (?:business|venture)|(?:business|venture|enterprise) propos(?:al|ition))/i That makes more sense but the rule still seems like it would be easily triggered by standard business talk (e.g. business proposal). I guess that's the nature of business emails... they're naturally spammy. ...but I'll let Henrik comment. Potentially, making it a rawbody rule might avoid this FP without affecting its performance against the targeted spams... For future reference: sending a sample email to the list as a bare attachment is problematic, as it may be altered en-route and thus invalidate any meaningful analysis. It's better to attach it as a zip/gzip, or to upload it to someplace like Pastebin and just post the URL to it here. (In this case, your description should probably be enough to figure it out without the sample so you shouldn't need to do that unless someone explicitly asks you to do so.) Thanks I'll keep that in mind. - Aner
Re: Rule HK_SCAM is triggered by standard business email
On Wed, 1 Jul 2020, Aner Perez wrote: I opened a bug (7832) about this but was told to report on the SA users mailing list instead. The attached email is an example which triggers the HK_SCAM rule. Looks like __HK_SCAM_S7 is the culprit here since it matches the words "business" and "enterprise" when they are found one after the other (even on different lines). In the real world this was triggered by a business email that had the following in the signature: FirstName LastName Altice Business Enterprise Account Executive What was the *overall* score of that message? Was this rule enough to push the message over the spam threshold (5 points)? Or was the message still scored as ham? It looks like to me like the logic in __HK_SCAM_S7 is a little off... /(?:(?:investment|proposed|lucrative) (?:business|venture)|(?:business|venture) (?:enterprise|propos(?:al|ition)))/i seems like it should be: /(?:(?:investment|proposed|lucrative) (?:business|venture)|(?:business|venture|enterprise) propos(?:al|ition))/i ...but I'll let Henrik comment. Potentially, making it a rawbody rule might avoid this FP without affecting its performance against the targeted spams... For future reference: sending a sample email to the list as a bare attachment is problematic, as it may be altered en-route and thus invalidate any meaningful analysis. It's better to attach it as a zip/gzip, or to upload it to someplace like Pastebin and just post the URL to it here. (In this case, your description should probably be enough to figure it out without the sample so you shouldn't need to do that unless someone explicitly asks you to do so.) -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The philosophy of gun control: Teenagers are roaring through town at 90MPH, where the speed limit is 25. Your solution is to lower the speed limit to 20. -- Sam Cohen --- 3 days until the 244th anniversary of the Declaration of Independence
Rule HK_SCAM is triggered by standard business email
I opened a bug (7832) about this but was told to report on the SA users mailing list instead. The attached email is an example which triggers the HK_SCAM rule. Looks like __HK_SCAM_S7 is the culprit here since it matches the words "business" and "enterprise" when they are found one after the other (even on different lines). In the real world this was triggered by a business email that had the following in the signature: FirstName LastName Altice Business Enterprise Account Executive - Aner --- Begin Message --- Let's list some Business Enterprise Sounds simple --- End Message ---
Re: Frequency of SUSP_NTLD updates
On Wed, 1 Jul 2020, @lbutlr wrote: On 30 Jun 2020, at 09:31, RW wrote: On Tue, 30 Jun 2020 11:30:17 + Roald Stolte wrote: These mails were all using TLDs such as .site and .online and were getting marked because of it. Are others seeing a decrease in spam from .site and .online? All I see from these TLD is 100% spam. They are not at the volume that .top was when this free-for all on TLDs started, but they are not generating any legitimate mail on my servers. That matches my experience. You could just drop the score for FROM_SUSPICIOUS_NTLD & FROM_SUSPICIOUS_NTLD_FP. This is probably the best way, but I'd be wary of dropping it too much. Especially as the rule covers *other* rarely-legit TLDs as well, and that would impact their scoring. I'd suggest instead a rule with an offsetting negative score (not necessarily an actual whitelist/accept entry as that's *too* generous) for the TLDs (or if possible the specific domains in those TLDs) that are causing problems. I realize this isn't really a welcome solution per the original note but until the legitimate use of those TLDs grows the rules punishing them do have value. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Microsoft is not a standards body. --- 3 days until the 244th anniversary of the Declaration of Independence
Re: Frequency of SUSP_NTLD updates
On 30 Jun 2020, at 09:31, RW wrote: > On Tue, 30 Jun 2020 11:30:17 + > Roald Stolte wrote: > > >> These mails were all using TLDs such as .site and .online and were >> getting marked because of it. Are others seeing a decrease in spam from .site and .online? All I see from these TLD is 100% spam. They are not at the volume that .top was when this free-for all on TLDs started, but they are not generating any legitimate mail on my servers. I've loosened some restrictions on .fm tv and ,info, since there are legitimate senders there, but even those are still mostly spam. I see connections from domains like server.creativecabin.online, mail.mobile-advertising.site, mail.freebitcoins.site, dand fame.servetxt.online, and most of it is coming in to spam-trap email addresses. > You could just drop the score for FROM_SUSPICIOUS_NTLD & > FROM_SUSPICIOUS_NTLD_FP. This is probably the best way, but I'd be wary of dropping it too much. -- Good old Dame Fortune. You can _depend_ on her.
Re: Detection rate of msbl.org
On Wed, 1 Jul 2020 10:49:03 +0200 Marc Roos wrote: > Jul 1 01:08:45 spam1 sendmail[19193]: 05UN8fHL019193: Milter: > from=, reject=550 5.7.1 Rejected > feedb...@service.alibaba.com SPAM (ebl.msbl.org) I don't know what this is, but I guess it's not a purely SA based milter as it gives a single reason for rejection. Most of the hits on EBL that I get with SA are from addresses parsed out of the body - often from HTML. If your milter can't do that you wont get good results. EBL is most effective against a subset of difficult spam where other types of list don't work. It should really be judged on how it effects what would otherwise would get past content filtering, not on what it prevents reaching content filtering.
RE: Detection rate of msbl.org
Not much yet, I got this one[1]. But I am having this check as one of the last. Most connections are already failing with 'Possibly forged hostname' [1] Jul 1 01:08:45 spam1 sendmail[19193]: 05UN8fHL019193: Milter: from=, reject=550 5.7.1 Rejected feedb...@service.alibaba.com SPAM (ebl.msbl.org) -Original Message- From: James Brown [mailto:jlbr...@bordo.com.au] Sent: maandag 22 juni 2020 16:07 To: users@spamassassin.apache.org Subject: Detection rate of msbl.org I’m thinking about using the EBL from msbl.org with SA. Can anyone tell me what detection rate they are getting with it? Is it worth using, or would the spam be trapped by other methods (RBL, etc) anyway? Pretty hard to find much information about how useful it is. Thanks, James.