Re: Order of handling whitelist/blacklist

2024-03-28 Thread Philip Prindeville via users



> On Mar 28, 2024, at 12:18 PM, Matus UHLAR - fantomas  
> wrote:
> 
>>> On 27.03.24 20:56, Philip Prindeville via users wrote:
>>>> I have something that looks like:
>>>> 
>>>> whitelist_from_rcvd v...@yandex.ru vger.kernel.org
>>>> 
>>>> blacklist_from *@yandex.ru
>>>> 
>>>> And I only ever seem to see the 2nd rule being hit, but not the first.
>>>> 
>>>> What is the order of evaluation?  Mail::SpamAssassin::Conf doesn't say 
>>>> that I could find.
>>>> 
>>>> You'd think the first would happen first, since it's more specific.
>>>> 
>>>> Or, maybe that both would happen.
> 
>>> On Mar 28, 2024, at 2:39 AM, Matus UHLAR - fantomas  
>>> wrote:
>>> they both should happen.
>>> note that the second argument must be Received: header provided by trusted 
>>> server, so that argument depends on proper TrustPath set up
>>> 
>>> https://cwiki.apache.org/confluence/display/SPAMASSASSIN/TrustPath
> 
> On 28.03.24 11:55, Philip Prindeville via users wrote:
>> My config also has:
>> 
>> trusted_networks 192.168.6.0/24
>> trusted_networks 192.168.8.0/24
>> trusted_networks 127.0.0.1/32
>> 
>> So I don't think that's the problem.
>> 
>> What are some steps to troubleshoot how the white/black-listing is happening?
> 
> can you show us the headers? Here or somewhere on pastebin?
> 


No need, but thanks.

Got my head out of my butt.  I had somehow missed that vger.kernel.org as a 
"multihomed" (or "anycast", depending on how you look at it) had ceased to 
exist as an outbound relay for the LKML's and been replaced by 
(am|ny|sv|sy).mirrors.kernel.org back around Dec 19 last year.

When I switched to:

whitelist_from_rcvd v...@yandex.ru mirrors.kernel.org

things started working again.

-Philip



Re: Order of handling whitelist/blacklist

2024-03-28 Thread Philip Prindeville via users



> On Mar 28, 2024, at 12:18 PM, Matus UHLAR - fantomas  
> wrote:
> 
>>> On 27.03.24 20:56, Philip Prindeville via users wrote:
>>>> I have something that looks like:
>>>> 
>>>> whitelist_from_rcvd v...@yandex.ru vger.kernel.org
>>>> 
>>>> blacklist_from *@yandex.ru
>>>> 
>>>> And I only ever seem to see the 2nd rule being hit, but not the first.
>>>> 
>>>> What is the order of evaluation?  Mail::SpamAssassin::Conf doesn't say 
>>>> that I could find.
>>>> 
>>>> You'd think the first would happen first, since it's more specific.
>>>> 
>>>> Or, maybe that both would happen.
> 
>>> On Mar 28, 2024, at 2:39 AM, Matus UHLAR - fantomas  
>>> wrote:
>>> they both should happen.
>>> note that the second argument must be Received: header provided by trusted 
>>> server, so that argument depends on proper TrustPath set up
>>> 
>>> https://cwiki.apache.org/confluence/display/SPAMASSASSIN/TrustPath
> 
> On 28.03.24 11:55, Philip Prindeville via users wrote:
>> My config also has:
>> 
>> trusted_networks 192.168.6.0/24
>> trusted_networks 192.168.8.0/24
>> trusted_networks 127.0.0.1/32
>> 
>> So I don't think that's the problem.
>> 
>> What are some steps to troubleshoot how the white/black-listing is happening?
> 
> can you show us the headers? Here or somewhere on pastebin?
> 


No need, but thanks.

Got my head out of my butt.  I had somehow missed that vger.kernel.org as a 
"multihomed" (or "anycast", depending on how you look at it) had ceased to 
exist as an outbound relay for the LKML's and been replaced by 
(am|ny|sv|sy).mirrors.kernel.org back around Dec 19 last year.

When I switched to:

whitelist_from_rcvd v...@yandex.ru mirrors.kernel.org

things started working again.

-Philip



Re: Order of handling whitelist/blacklist

2024-03-28 Thread Philip Prindeville via users



> On Mar 28, 2024, at 2:39 AM, Matus UHLAR - fantomas  wrote:
> 
> On 27.03.24 20:56, Philip Prindeville via users wrote:
>> I have something that looks like:
>> 
>> whitelist_from_rcvd v...@yandex.ru vger.kernel.org
>> 
>> blacklist_from *@yandex.ru
>> 
>> And I only ever seem to see the 2nd rule being hit, but not the first.
>> 
>> What is the order of evaluation?  Mail::SpamAssassin::Conf doesn't say that 
>> I could find.
>> 
>> You'd think the first would happen first, since it's more specific.
>> 
>> Or, maybe that both would happen.
> 
> they both should happen.
> note that the second argument must be Received: header provided by trusted 
> server, so that argument depends on proper TrustPath set up
> 
> https://cwiki.apache.org/confluence/display/SPAMASSASSIN/TrustPath
> -- 
> Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
> Warning: I wish NOT to receive e-mail advertising to this address.
> Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
>   One OS to rule them all, One OS to find them,
> One OS to bring them all and into darkness bind them

My config also has:

trusted_networks 192.168.6.0/24
trusted_networks 192.168.8.0/24
trusted_networks 127.0.0.1/32

So I don't think that's the problem.

What are some steps to troubleshoot how the white/black-listing is happening?

Thanks



Order of handling whitelist/blacklist

2024-03-27 Thread Philip Prindeville via users
Hi.

I have something that looks like:

whitelist_from_rcvd v...@yandex.ru vger.kernel.org

blacklist_from  *@yandex.ru

And I only ever seem to see the 2nd rule being hit, but not the first.

What is the order of evaluation?  Mail::SpamAssassin::Conf doesn't say that I 
could find.

You'd think the first would happen first, since it's more specific.

Or, maybe that both would happen.

ATT RBL f---wits

2023-11-27 Thread Philip Prindeville
We're being blacklisted by att.net with the following message:

   (reason: 550 5.7.1 Connections not accepted from servers without a valid 
sender domain.flph840 Fix reverse DNS for 24.116.100.90)

I don't know what the hell is up with these pinheads:

philipp@ubuntu22:~$ dig -tmx redfish-solutions.com. @8.8.8.8

; <<>> DiG 9.18.12-0ubuntu0.22.04.3-Ubuntu <<>> -tmx redfish-solutions.com. 
@8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 58379
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;redfish-solutions.com. IN MX

;; ANSWER SECTION:
redfish-solutions.com. 21600 IN MX 10 mail.redfish-solutions.com.

;; Query time: 48 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Sun Nov 19 15:08:29 MST 2023
;; MSG SIZE  rcvd: 71

philipp@ubuntu22:~$ dig -ta mail.redfish-solutions.com. @8.8.8.8

; <<>> DiG 9.18.12-0ubuntu0.22.04.3-Ubuntu <<>> -ta mail.redfish-solutions.com. 
@8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19570
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;mail.redfish-solutions.com. IN A

;; ANSWER SECTION:
mail.redfish-solutions.com. 21600 IN A 24.116.100.90

;; Query time: 72 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Sun Nov 19 15:08:39 MST 2023
;; MSG SIZE  rcvd: 71

philipp@ubuntu22:~$ dig -x 24.116.100.90 @8.8.8.8

; <<>> DiG 9.18.12-0ubuntu0.22.04.3-Ubuntu <<>> -x 24.116.100.90 @8.8.8.8
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 2371
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;90.100.116.24.in-addr.arpa. IN PTR

;; ANSWER SECTION:
90.100.116.24.in-addr.arpa. 21600 IN PTR mail.redfish-solutions.com.

;; Query time: 68 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Sun Nov 19 15:08:55 MST 2023
;; MSG SIZE  rcvd: 95

philipp@ubuntu22:~$

So that's not the problem.  You're supposed to be able to get the blacklisting 
fixed if you email abuse_...@abuse-att.net  but 
I've emailed them from 3 different addresses and have yet to get a response 
much less a resolution.

Has anyone else had to deal with this bullocks and gotten it resolved?

Thanks



Re: DKIM absence

2023-05-02 Thread Philip Prindeville



> On May 2, 2023, at 9:37 AM, Thomas Johnson  wrote:
> 
> 
>> On May 2, 2023, at 8:27 AM, Philip Prindeville 
>>  wrote:
>> 
>> Is there a way to add scoring that says, "If the sending domain has DKIM 
>> records, but there's no DKIM signature on this message, then attach a high 
>> score to it?"
>> 
>> We seem to attach negative scores when DKIM is present and valid, but what 
>> about the opposite direction?
>> 
>> If it's absent, but it shouldn't be?
>> 
> 
> 
> If there’s no dkim signature, you can’t check for dkim records in dns. The 
> selector for a dkim signature is arbitrary - there’s no one dns lookup you 
> can do to see all possible dkim records for a domain. 
> 
> You can use ADSP - it’s old and I don’t know how many domains have ADSP 
> records these days, but it lets a domain specify that all mail must be dkim 
> signed to be considered valid.  
> 
> We tell our customers to add an ADSP record, and we use it when checking 
> their incoming mail to help identify forgeries. I don’t know that it helps 
> much with mail from non-customers, though.  I’ll have to check and see how 
> often our rules hit for that. 
> 


Right, because you need to grovel out the selector from the DKIM-Signature 
line.  Groan.

That you can't mark a domain as requiring DKIM at the top-level seems to be a 
design flaw in the protocol.




DKIM absence

2023-05-02 Thread Philip Prindeville
Is there a way to add scoring that says, "If the sending domain has DKIM 
records, but there's no DKIM signature on this message, then attach a high 
score to it?"

We seem to attach negative scores when DKIM is present and valid, but what 
about the opposite direction?

If it's absent, but it shouldn't be?



Re: Did the whitelist_from_rcvd semantics change?

2023-05-01 Thread Philip Prindeville



> On May 1, 2023, at 3:48 AM, Reindl Harald  wrote:
> 
> 
> 
> Am 30.04.23 um 20:54 schrieb Philip Prindeville:
>>> On Apr 28, 2023, at 12:17 PM, Philip Prindeville 
>>>  wrote:
>>> 
>>> 
>>> 
>>>> On Apr 28, 2023, at 10:24 AM, Reindl Harald  wrote:
>>>> 
>>>> 
>>>> 
>>>> Am 28.04.23 um 18:11 schrieb Philip Prindeville:
>>>>>> On Apr 25, 2023, at 6:28 AM, Bill Cole 
>>>>>>  wrote:
>>>>>> 
>>>>>> On 2023-04-24 at 16:32:55 UTC-0400 (Mon, 24 Apr 2023 14:32:55 -0600)
>>>>>> Philip Prindeville 
>>>>>> is rumored to have said:
>>>>>> 
>>>>>>> I thought the matching included subdomains, and seem to remember that 
>>>>>>> working.
>>>>>> 
>>>>>> It never has. At least not in the past 17 years.
>>>>>> 
>>>>> Then how do pools of servers like *.protection.outbound.outlook.com get 
>>>>> handled?
>>>> 
>>>> as * is always handeled at globbing
>>>> 
>>>> *.example.com
>>>> *@example.com
>>> 
>>> 
>>> Maybe I'm missing something, but the code brackets ${domain} with \Q and \E 
>>> so globbing wouldn't work.
>>> 
>>>   if ($rdns =~ /(?:^|\.)\Q${domain}\E$/i) { $match=1; last }
>>> 
>> But it *is* anchored on the left hand side by either beginning of line *or* 
>> dot
> 
> and what do you think "*" will do with the anchoring?
> 
> ^*


And that will continue to glob inside \Q ... \E ?

-Philip




Re: Did the whitelist_from_rcvd semantics change?

2023-04-30 Thread Philip Prindeville



> On Apr 28, 2023, at 12:17 PM, Philip Prindeville 
>  wrote:
> 
> 
> 
>> On Apr 28, 2023, at 10:24 AM, Reindl Harald  wrote:
>> 
>> 
>> 
>> Am 28.04.23 um 18:11 schrieb Philip Prindeville:
>>>> On Apr 25, 2023, at 6:28 AM, Bill Cole 
>>>>  wrote:
>>>> 
>>>> On 2023-04-24 at 16:32:55 UTC-0400 (Mon, 24 Apr 2023 14:32:55 -0600)
>>>> Philip Prindeville 
>>>> is rumored to have said:
>>>> 
>>>>> I thought the matching included subdomains, and seem to remember that 
>>>>> working.
>>>> 
>>>> It never has. At least not in the past 17 years.
>>>> 
>>> Then how do pools of servers like *.protection.outbound.outlook.com get 
>>> handled?
>> 
>> as * is always handeled at globbing
>> 
>> *.example.com
>> *@example.com
> 
> 
> Maybe I'm missing something, but the code brackets ${domain} with \Q and \E 
> so globbing wouldn't work.
> 
>   if ($rdns =~ /(?:^|\.)\Q${domain}\E$/i) { $match=1; last }
> 


But it *is* anchored on the left hand side by either beginning of line *or* dot.

-Philip




Re: Did the whitelist_from_rcvd semantics change?

2023-04-28 Thread Philip Prindeville



> On Apr 28, 2023, at 10:24 AM, Reindl Harald  wrote:
> 
> 
> 
> Am 28.04.23 um 18:11 schrieb Philip Prindeville:
>>> On Apr 25, 2023, at 6:28 AM, Bill Cole 
>>>  wrote:
>>> 
>>> On 2023-04-24 at 16:32:55 UTC-0400 (Mon, 24 Apr 2023 14:32:55 -0600)
>>> Philip Prindeville 
>>> is rumored to have said:
>>> 
>>>> I thought the matching included subdomains, and seem to remember that 
>>>> working.
>>> 
>>> It never has. At least not in the past 17 years.
>>> 
>> Then how do pools of servers like *.protection.outbound.outlook.com get 
>> handled?
> 
> as * is always handeled at globbing
> 
> *.example.com
> *@example.com


Maybe I'm missing something, but the code brackets ${domain} with \Q and \E so 
globbing wouldn't work.

   if ($rdns =~ /(?:^|\.)\Q${domain}\E$/i) { $match=1; last }




Re: Did the whitelist_from_rcvd semantics change?

2023-04-28 Thread Philip Prindeville



> On Apr 25, 2023, at 6:28 AM, Bill Cole 
>  wrote:
> 
> On 2023-04-24 at 16:32:55 UTC-0400 (Mon, 24 Apr 2023 14:32:55 -0600)
> Philip Prindeville 
> is rumored to have said:
> 
>> I thought the matching included subdomains, and seem to remember that 
>> working.
> 
> It never has. At least not in the past 17 years.
> 


Then how do pools of servers like *.protection.outbound.outlook.com get handled?


-Philip



Re: Did the whitelist_from_rcvd semantics change?

2023-04-24 Thread Philip Prindeville
Oh, and this is on Fedora, so I'm running 3.4.6...


> On Apr 24, 2023, at 2:32 PM, Philip Prindeville 
>  wrote:
> 
> Hi,
> 
> I have the following line:
> 
> whitelist_from_rcvd *@ceipalmm.com mailgun.net
> 
> And tried it on a message that had:
> 
> Return-Path: 
> 
> But it didn't get whitelisted.  If I change the pattern above to 
> "*@mg2.ceipalmm.com" it works.  I thought the matching included subdomains, 
> and seem to remember that working.
> 
> But just ran a simple test and that's not the case.
> 
> Is this a bug?  Looking at Mail/SpamAssassin/Plugin/WLBLEval.pm I see:
> 
>if ($rdns =~ /(?:^|\.)\Q${domain}\E$/i) { $match=1; last }
> 
> So I *thought* that was what was happening, but testing says otherwise.
> 
> Insights?
> 
> Thanks,
> 
> -Philip
> 



Did the whitelist_from_rcvd semantics change?

2023-04-24 Thread Philip Prindeville
Hi,

I have the following line:

whitelist_from_rcvd *@ceipalmm.com mailgun.net

And tried it on a message that had:

Return-Path: 

But it didn't get whitelisted.  If I change the pattern above to 
"*@mg2.ceipalmm.com" it works.  I thought the matching included subdomains, and 
seem to remember that working.

But just ran a simple test and that's not the case.

Is this a bug?  Looking at Mail/SpamAssassin/Plugin/WLBLEval.pm I see:

if ($rdns =~ /(?:^|\.)\Q${domain}\E$/i) { $match=1; last }

So I *thought* that was what was happening, but testing says otherwise.

Insights?

Thanks,

-Philip



Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-23 Thread Philip Prindeville



> On May 11, 2022, at 1:53 AM, Henrik K  wrote:
> 
> On Wed, May 11, 2022 at 10:49:32AM +0300, Henrik K wrote:
>> On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
>>> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
>>>> See my original message.
>>>> 
>>>> I can't think of a single way to match each header, and then test for any 
>>>> of them not matching the pattern...
>>> 
>>> Simply use regex negative lookahead.
>>> 
>>> ALL =~ /^(?!Foo|Bar):/m
>>> 
>>> It will hit any line _not_ starting with Foo: or Bar:
>> 
>> Oops I think it was buggy.. more like:
>> 
>> ALL =~ /^(?!(?:Foo|Bar):)/m
> 
> And for debug logging to log the missing header (to easily inspect what was
> matched) you need some additional string matching, lookahead itself doesn't
> save any string
> 
> ALL =~ /^(?!(?:Foo|Bar):)[^:]+/m
> 


Ended up using .*$ instead of [^:]* but that worked too.

Is it possible to count how many times we didn't see matching headers and then 
count those, setting some threshold, like 3 or more unknown headers?

Thanks,

-Philip



Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-13 Thread Philip Prindeville



> On May 11, 2022, at 9:24 AM, John Hardin  wrote:
> 
> On Tue, 10 May 2022, Philip Prindeville wrote:
> 
>> Anyone have a rule to detect the following nonsense headers seen in this 
>> message I got?
>> 
>> Return-Path: 
>> Received: from cp24.deluxehosting.com (cp24.deluxehosting.com 
>> [207.55.244.13])
>>  by mail (envelope-sender ) (MIMEDefang) with ESMTP 
>> id 23C2ch8H717309
>>  for ; Mon, 11 Apr 2022 20:38:50 -0600
>> To: "xy...@redfish-solutions.com" 
>> From: "Nabil, Home Depot" 
>> Message-ID: <35ee7c.8b8cf6.a...@uakron.edu>
>> Date: Mon, 11 Apr 2022 22:38:48 + (UTC)
>> Minicomputers-Exhume: sides
>> Subject: Nabil, 1 searches this week
>> Malthus-Films: 88976dea
>> List-Unsubscribe: 
>> <https://uakron.edu/?e=d567f7ae55e4=lun=39e56a34=email_notification_single_search_appearance_01=7=unsub=unsub=cd5be889cc8fde15c6d1ebf62c92cc37375723f3fea3ce35af8da>
>> Parasitic-Homogeneity: db5da28ba3e69a
>> MIME-Version: 1.0
>> Capitalizations-Grievously: oilers
>> Content-type: multipart/mixed; boundary="--=_1649731129-716331-86"
>> 
>> Obviously, the following bogus header names are present:
>> 
>> Minicomputers-Exhume
>> Malthus-Films
>> Parasitic-Homogeneity
>> Capitalizations-Grievously
> 
> Take a look at __RAND_HEADER and RAND_HEADER_MANY
> 
> 

For my test messages, __RAND_HEADER_MANY isn't firing.

Also, Return-Path: is listed in RFC-2822, and many delivering (terminal) MTA's 
add it, including Sendmail.

-Philip




Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-13 Thread Philip Prindeville



> On May 11, 2022, at 1:53 AM, Henrik K  wrote:
> 
> On Wed, May 11, 2022 at 10:49:32AM +0300, Henrik K wrote:
>> On Wed, May 11, 2022 at 10:44:05AM +0300, Henrik K wrote:
>>> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
>>>> See my original message.
>>>> 
>>>> I can't think of a single way to match each header, and then test for any 
>>>> of them not matching the pattern...
>>> 
>>> Simply use regex negative lookahead.
>>> 
>>> ALL =~ /^(?!Foo|Bar):/m
>>> 
>>> It will hit any line _not_ starting with Foo: or Bar:
>> 
>> Oops I think it was buggy.. more like:
>> 
>> ALL =~ /^(?!(?:Foo|Bar):)/m
> 
> And for debug logging to log the missing header (to easily inspect what was
> matched) you need some additional string matching, lookahead itself doesn't
> save any string
> 
> ALL =~ /^(?!(?:Foo|Bar):)[^:]+/m
> 


How do you look at what a rule is matching?  I've never figured that out...

-Philip




Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-13 Thread Philip Prindeville



> On May 11, 2022, at 1:44 AM, Henrik K  wrote:
> 
> On Tue, May 10, 2022 at 06:19:38PM -0600, Philip Prindeville wrote:
>> See my original message.
>> 
>> I can't think of a single way to match each header, and then test for any of 
>> them not matching the pattern...
> 
> Simply use regex negative lookahead.
> 
> ALL =~ /^(?!Foo|Bar):/m
> 
> It will hit any line _not_ starting with Foo: or Bar:
> 


Ah, that did it.

Of course, if I get false positives, I'll have to search for the header names I 
forgot to include manually...




Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-10 Thread Philip Prindeville



> On May 10, 2022, at 5:57 PM, Martin Gregorie  wrote:
> 
> On Tue, 2022-05-10 at 17:29 -0600, Philip Prindeville wrote:
>> 
>> You're correct that they're different in every message received.
>> 
> So write a rule that fires on any header name that *doesn't* match
> anything in the list of legit headers as defined in the relevant RFCs.


See my original message.

I can't think of a single way to match each header, and then test for any of 
them not matching the pattern...


> 
> Of course you may need to extend that list to include some extras, such
> as headers injected by SA itself, as well as DMARC, DKIM, SPF etc.


That's the easy part.


> 
> Martin
> 
> 



Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-10 Thread Philip Prindeville



> On May 10, 2022, at 5:57 PM, Martin Gregorie  wrote:
> 
> On Tue, 2022-05-10 at 17:29 -0600, Philip Prindeville wrote:
>> 
>> You're correct that they're different in every message received.
>> 
> So write a rule that fires on any header name that *doesn't* match
> anything in the list of legit headers as defined in the relevant RFCs.


See my original message.

I can't think of a single way to match each header, and then test for any of 
them not matching the pattern...


> 
> Of course you may need to extend that list to include some extras, such
> as headers injected by SA itself, as well as DMARC, DKIM, SPF etc.


That's the easy part.


> 
> Martin
> 
> 



Re: Rule to detect non-standard headers that aren't X- prefixed

2022-05-10 Thread Philip Prindeville



> On May 10, 2022, at 4:58 PM, Kevin A. McGrail  wrote:
> 
> On 5/10/2022 6:10 PM, Philip Prindeville wrote:
>> Anyone have a rule to detect the following nonsense headers seen in this 
>> message I got?
> 
> Interesting. Those look more like something that Bayesian learning would be 
> best to handle.
> 
> But, have you built a corpora of spam and ham?  Do a list of headers that 
> appear in ham and spam corpora and xor out the spam ones.  Then write a rule 
> if any of those exist.  They look like they might change a lot and they are 
> randomized to avoid these type of issues so I see your dilemma and a plugin 
> might be needed.
> 
> Regards,
> KAM


You're correct that they're different in every message received.




Rule to detect non-standard headers that aren't X- prefixed

2022-05-10 Thread Philip Prindeville
Anyone have a rule to detect the following nonsense headers seen in this 
message I got?

Return-Path: 
Received: from cp24.deluxehosting.com (cp24.deluxehosting.com [207.55.244.13])
by mail (envelope-sender ) (MIMEDefang) with ESMTP 
id 23C2ch8H717309
for ; Mon, 11 Apr 2022 20:38:50 -0600
To: "xy...@redfish-solutions.com" 
From: "Nabil, Home Depot" 
Message-ID: <35ee7c.8b8cf6.a...@uakron.edu>
Date: Mon, 11 Apr 2022 22:38:48 + (UTC)
Minicomputers-Exhume: sides
Subject: Nabil, 1 searches this week
Malthus-Films: 88976dea
List-Unsubscribe: 
<https://uakron.edu/?e=d567f7ae55e4=lun=39e56a34=email_notification_single_search_appearance_01=7=unsub=unsub=cd5be889cc8fde15c6d1ebf62c92cc37375723f3fea3ce35af8da>
Parasitic-Homogeneity: db5da28ba3e69a
MIME-Version: 1.0
Capitalizations-Grievously: oilers
Content-type: multipart/mixed; boundary="--=_1649731129-716331-86"

Obviously, the following bogus header names are present:

Minicomputers-Exhume
Malthus-Films
Parasitic-Homogeneity
Capitalizations-Grievously

The list of legitimate headers is quite small, per RFC-2822 Section 3.6 and 
3.6.7 (odd that 3.6.8 doesn't call out the X-* requirement).

I'd like to fingerprint messages based on non-standard header names.

Has anyone undertaken this already?  I tried playing with:

header __L_NON_STD_HEADERS  ALL !~ 
/^(Return-Path|Received|Resent-Date|Resent-From|Resent-Sender|Resent-To|Resent-Cc|Resent-Bcc|Resent-Message-ID|Date|From|Sender|Reply-To|To|Cc|Bcc|Message-ID|In-Reply-To|References|Subject|Comments|Keywords|Content-Type|Content-Transfer-Encoding|MIME-Version|DKIM-Signature|X-([A-Z][a-z]+(-[A-Z][a-z]*)*))\:/m

But that will only match if *none* of the headers are standard ones, so that 
won't work... I really need to examine the headers one-by-one.

Thanks,

-Philip




Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-12-26 Thread Philip Prindeville



> On Nov 16, 2021, at 8:03 PM, Henrik K  wrote:
> 
> On Tue, Nov 16, 2021 at 01:08:16PM -0700, Philip Prindeville wrote:
>> 
>> Or http.sh points to an NS that's offline...
> 
> Your resolver shoukd time out _way_ sooner than some minutes.
> 
>> Can the async lookup be back-ported?
> 
> No, and there will be no new 3.4 releases.
> 


Yeah, I still need to figure that out...

When I run "dig -t any http.sh" it times out after a few seconds.  But 
SpamAssassin is doing something very different.  Not sure why.

In any case, the workaround seems to be:

uri_block_exclude __L_BLOCK_ISP ... http.sh shlom.in


Where not resolving these last two domains makes the timeouts go away.  Note 
that the pathology is the same in both cases:

philipp@macbook3 ~ % dig @8.8.8.8 -tns shlom.in.

; <<>> DiG 9.10.6 <<>> @8.8.8.8 -tns shlom.in.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38665
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;shlom.in.  IN  NS

;; ANSWER SECTION:
shlom.in.   300 IN  NS  ns1gmz.name.com.
shlom.in.   300 IN  NS  ns2jrt.name.com.
shlom.in.   300 IN  NS  ns3qtx.name.com.
shlom.in.   300 IN  NS  ns4blx.name.com.

;; Query time: 84 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sun Dec 26 15:25:44 MST 2021
;; MSG SIZE  rcvd: 129

philipp@macbook3 ~ % 
philipp@macbook3 ~ % dig @8.8.8.8 -tns http.sh.

; <<>> DiG 9.10.6 <<>> @8.8.8.8 -tns http.sh.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10013
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;http.sh.   IN  NS

;; ANSWER SECTION:
http.sh.60  IN  CNAME   park.io.
park.io.14797   IN  NS  ns-1348.awsdns-40.org.
park.io.14797   IN  NS  ns-1624.awsdns-11.co.uk.
park.io.14797   IN  NS  ns-441.awsdns-55.com.
park.io.14797   IN  NS  ns-672.awsdns-20.net.

;; Query time: 245 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sun Dec 26 15:25:03 MST 2021
;; MSG SIZE  rcvd: 197

philipp@macbook3 ~ % 


Seems a little broken that the NS records aren't accompanied by 'A' glue 
records, but that's not catastrophic... normally a 2nd query would be done.

Should the resolver code in SpamAssassin be more robust when it comes to such 
failures?


philipp@macbook3 ~ % dig -ta ns-1348.awsdns-40.org.

; <<>> DiG 9.10.6 <<>> -ta ns-1348.awsdns-40.org.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37011
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;ns-1348.awsdns-40.org. IN  A

;; ANSWER SECTION:
ns-1348.awsdns-40.org.  78740   IN  A   205.251.197.68

;; Query time: 51 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Sun Dec 26 15:27:16 MST 2021
;; MSG SIZE  rcvd: 66

philipp@macbook3 ~ % dig @205.251.197.68 -ta http.sh

; <<>> DiG 9.10.6 <<>> @205.251.197.68 -ta http.sh
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 28411
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;http.sh.   IN  A

;; Query time: 67 msec
;; SERVER: 205.251.197.68#53(205.251.197.68)
;; WHEN: Sun Dec 26 15:27:32 MST 2021
;; MSG SIZE  rcvd: 25

philipp@macbook3 ~ % 


I'm not exactly sure what's falling down or why.

Is there anyone with more BIND-fu than me that's willing to venture a guess?

-Philip



Re: MIME_BASE64_TEXT only on us-ascii

2021-12-11 Thread Philip Prindeville



> On Nov 30, 2021, at 1:10 PM, Matija Nalis  wrote:
> 
> On Tue, Nov 30, 2021 at 12:03:15PM -0700, Philip Prindeville wrote:
>>> On Nov 17, 2021, at 9:50 AM, Bill Cole 
>>>  wrote:
>>> SpamAssassin rules are not laws in any sense. They do not prescribe or 
>>> proscribe any action. They do not reflect any sort of moral or ethical 
>>> judgment. They do not express or define technical correctness.
>> 
>> Isn't that exactly what we're discussing here?  "Technical correctness"?
> 
> Hm, no? App encoding pure ASCII is Base64 is not breaking any RFC?
> So it is behaving "technically correctly".


Again, Postel's Rule.

Excessive and unnecessary encoding isn't behaving correctly.


> 
>> Good internetworking implementations follow (to the extent they don't 
>> conflict with good security practices) Postel's Law, "be conservative in 
>> what you send, be liberal [but not naive] in what you accept".
> 
> Well, antispam efforts (as is security for important stuff) are
> mostly exactly the OPPOSITE of good internetworking implementations
> of the old Postel's law.


Yeah, they date from a more innocent time.  Unfortunately Jon passed away 
before he could adjust it for a more modern world.  (He was one of my mentors 
and I miss him, along with Bob Braden.)


> And for the good reasons - in the internetworking implementations of
> the old, the vast majority of peers (if not all) you interacted with
> were GOOD guys trying to do good things.
> 
> In today e-mail (and security), the majority of the actors are
> enemies trying to penetrate your defensive lines. 


That might be overstated.


> Also, see https://en.wikipedia.org/wiki/Robustness_principle#Criticism


I'm aware. Jon and I had a few arguments about this.

Including about how it weakened the effectiveness of Bake-Offs and 
stringency/conformance testing.



>> Rereading:
>>> Base64 encoding is only necessary if there are non-ASCII characters used. 
>>> UTF-8 is a superset of ASCII & it is normal for MUAs to not encode more 
>>> than needed.
>> 
>> Exactly.  Encoding is only used when and where necessary.
> 
> ...by legitimate users. Spammers on the other hand will sometimes 
> encode even when it is NOT needed, probably in attempt to avoid less
> advanced antispam tools (or due to sheer laziness when writing spam
> tool). 
> 
> The fact that such encoding is tehnically allowed does NOT change the
> fact that the tecnique is vastly more used by spammers than by
> innocent parties.


I don't think anyone is arguing otherwise.

-Philip


> 
>> Properly encoded HTML uses HTML-Entity naming, which is also ASCII-friendly, 
>> i.e.  instead of Latin1  etc. or raw 8bit characters.
> 
> There are several "proper" (ie. allowed by different RFCs) ways to
> encode that information in mail. Statistical analyses seem to say that
> some of the ways are used much more by spammers then by legitimate
> users. Hence, the score for those methods.
> 
> -- 
> Opinions above are GNU-copylefted.



Re: MIME_BASE64_TEXT only on us-ascii

2021-11-30 Thread Philip Prindeville


> On Nov 17, 2021, at 9:50 AM, Bill Cole 
>  wrote:
> 
> SpamAssassin rules are not laws in any sense. They do not prescribe or 
> proscribe any action. They do not reflect any sort of moral or ethical 
> judgment. They do not express or define technical correctness.


Isn't that exactly what we're discussing here?  "Technical correctness"?

Good internetworking implementations follow (to the extent they don't conflict 
with good security practices) Postel's Law, "be conservative in what you send, 
be liberal [but not naive] in what you accept".

The point earlier in the thread was that using more encoding than is strictly 
necessary is not being "conservative in what you send", since it puts extra 
burden on the receiver to have a robust and complete implementation, and 
creates more opportunity to have an interoperability failure.

Rereading:


> Base64 encoding is only necessary if there are non-ASCII characters used. 
> UTF-8 is a superset of ASCII & it is normal for MUAs to not encode more than 
> needed.


Exactly.  Encoding is only used when and where necessary.

Properly encoded HTML uses HTML-Entity naming, which is also ASCII-friendly, 
i.e.  instead of Latin1  etc. or raw 8bit characters.

-Philip



SPF_NONE scoring

2021-11-30 Thread Philip Prindeville
Hi,

I'm looking at the 0.001 scoring for SPF_NONE and scratching my head.  This was 
discussed a bit in early 2015, but maybe it needs revisiting with new 
perspective.

Surely no one who cares about maintaining their reputation by protecting 
themselves against spoofing would fail to provide SPF records...  So how is 
this score arrived at?

And of Ham, how much of it has a valid SPF?

And of Spam, how much of it lacks a valid SPF?

Has anyone run some numbers?

Thanks,

-Philip



Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-16 Thread Philip Prindeville



> On Nov 16, 2021, at 3:30 AM, Martin Gregorie  wrote:
> 
> On Mon, 2021-11-15 at 17:12 -0700, Philip Prindeville wrote:
>> 
>> 
>>> On Nov 15, 2021, at 5:06 PM, Greg Troxel  wrote:
>>> 
>>> 
>>> Philip Prindeville  writes:
>>> 
>>>> Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
>>>> 
>>>> Why can't I even find the rule?
>>> 
> try "locate txrep"
> 
> On my Fedora system 'locate' says that TxRep is a plugin, enabled by
> installing:  /usr/share/spamassassin/60_txrep.cf
> 
> and, using "locate" again, that the plugin's manpage is  
> /usr/share/man/man3/Mail::SpamAssassin::Plugin::TxRep.3pm.gz
> 
> So, "man 3 Mail::SpamAssassin::Plugin::TxRep" describes the TxRep plugin
> and 'locate' says it is installed as
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm
> 
> Of course, other Linux distros may put it somewhere else, so use
> 'locate' and, if it doesn't find 'txrep', run 'sudo updatedb' and try
> again. 
> 
> Not trying to teach you to suck eggs, but, incredible as it may sound,
> there are still some people who don't know about the 'locate' command.
> 
> Martin
> 



I know how to find the code modules.  I wasn't sure where the database was 
kept.  And I invoke SA from within Mimedefang, not via procmail on individual 
delivery, so only global rules get run.  Nothing is per-user.

-Philip




Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-16 Thread Philip Prindeville



> On Nov 15, 2021, at 11:12 PM, Henrik K  wrote:
> 
> On Mon, Nov 15, 2021 at 04:25:55PM -0700, Philip Prindeville wrote:
>> 
>> 
>>> On Nov 12, 2021, at 10:35 PM, Henrik K  wrote:
>>> 
>>> On Fri, Nov 12, 2021 at 07:49:00PM -0800, John Hardin wrote:
>>>> 
>>>> What would be helpful here would be logging of when a rule *starts*
>>>> evaluation. Normally that would be painful, but for tracking a runaway it
>>>> would be useful. Perhaps I can code up something to capture that and log it
>>>> on a timeout...
>>> 
>>> It already exists
>>> 
>>> spamassassin -D all,rules-all < msg
>>> 
>> 
>> 
>> Ah, that was useful.
>> 
>> Seeing:
>> 
>> ...
>> Nov 15 16:09:40.479 [54834] dbg: check: uri_block_cidr list 
>> 54.69.70.160-54.69.70.160 65.181.64.0-65.181.127.255 97.74.42.81-97.74.42.81 
>> 98.124.199.1-98.124.199.1 167.114.67.88-167.114.67.88 
>> 173.45.101.58-173.45.101.58 185.53.128.0-185.53.131.255 
>> 192.31.186.4-192.31.186.4 192.99.0.0-192.99.255.255 
>> 194.213.125.57-194.213.125.57
>> Nov 15 16:11:50.610 [54834] dbg: check: uri_local_bl http.sh addrs 
>> ...
>> Nov 15 16:16:00.876 [54834] dbg: async: timing: 5.193 . 
>> dns:A:100.136.134.134.psbl.surriel.com
>> Nov 15 16:16:00.876 [54834] dbg: async: timing: 5.199 . 
>> dns:A:100.136.134.134.dnsbl.sorbs.net
>> Nov 15 16:16:00.876 [54834] dbg: async: timing: 385.726 X A:http.sh
>> Nov 15 16:16:00.876 [54834] dbg: async: timing: 385.726 X NS:http.sh
>> ...
>> 
>> 
>> Why would resolving http.sh take this long?  And can we bring down the 
>> timeout?
>> 
>> Hard to imagine DNS requests taking more than a couple of seconds.
>> 
>> -Philip
> 
> SA 3.4 has old version of URILocalBL without async lookups, probably that's
> why.  Not sure why it would take minutes though, unless your system DNS is
> configured strangely and/or is not caching..
> 



Or http.sh points to an NS that's offline...

Can the async lookup be back-ported?




Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-16 Thread Philip Prindeville
Replies... some duplication of conversation on "mimedefang".



> On Nov 15, 2021, at 10:34 PM, Bill Cole 
>  wrote:
> 
> On 2021-11-15 at 18:08:20 UTC-0500 (Mon, 15 Nov 2021 16:08:20 -0700)
> Philip Prindeville 
> is rumored to have said:
> 
>>> On Nov 12, 2021, at 8:49 PM, John Hardin  wrote:
>>> 
>>> On Fri, 12 Nov 2021, Philip Prindeville wrote:
>>> 
>>>> I got the message, saved it to a flat file, and ran "spamassassin -t -D 
>>>> rules < netdev.eml" and saw:
>>>> 
>>>> ...
>>>> Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule 
>>>> __ANY_TEXT_ATTACH_DOC ==> got hit (1)
>>>> ...
>>>> Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH 
>>>> ==> got hit (1)
>>>> Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in 
>>>> Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1, skipping 
>>>> further tests
>>>> ...
>>>> 
>>>> Am I correct that __ANY_TEXT_ATTACH alone took 4:30s?
>>> 
>>> "ran ... got hit" is past tense. And it needs to complete the rule to know 
>>> whether it got a hit.
>>> 
>>> 11:45:38.048 -> 11:45:38.063 = less than 20 msec.
>>> 
>>> The next rule, whatever that was, is the one that timed out after 4m20s.
>> 
>> 
>> Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
>> 
>> Why can't I even find the rule?
> 
> That "rule" is actually a subroutine that is assembled and named at runtime 
> in M:SA:Check from set 1 (probably the only set) of the "body eval" (type 11) 
> rules running at priority 0.


Yeah, if we can get the source file and line # that's almost as good as a 
function name (since there isn't one, if it's an anonymous sub block).


> 
> Which suggests a tough bit of troubleshooting.
> 
>>>> Could there be rules that *aren't* matching but are taking a while?
>>> 
>>> It's timing out on a rule that's running away. The timeout triggers before 
>>> "hit/no hit" is known.
>>> 
>>> What would be helpful here would be logging of when a rule *starts* 
>>> evaluation. Normally that would be painful, but for tracking a runaway it 
>>> would be useful. Perhaps I can code up something to capture that and log it 
>>> on a timeout...
>> 
>> 
>> Whenever a rule gets started, you could save the name and start time, and 
>> then burp that during timeout handling, right?
> 
> I like that idea. I have no idea how feasible it is.


Me neither.  I use Perl less and less everyday, and my fu is fading almost as 
fast as new features are coming in.

-Philip



Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-15 Thread Philip Prindeville



> On Nov 15, 2021, at 5:06 PM, Greg Troxel  wrote:
> 
> 
> Philip Prindeville  writes:
> 
>> Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
>> 
>> Why can't I even find the rule?
> 
> That looks very familiar.  I was having timeouts, and saw that in the
> logs, on certain messages.  I ended up nuking and rebuilding my TXREP
> database and then things were  ok.
> 
> That doesn't explain why we can't find the rule, which is a good
> question.
> 


Where is the TXREP database?

Also, is it possible that the name is generated through some sort of mangling, 
the way that C function names can be generated from macro expansions, etc?




Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-15 Thread Philip Prindeville



> On Nov 12, 2021, at 10:35 PM, Henrik K  wrote:
> 
> On Fri, Nov 12, 2021 at 07:49:00PM -0800, John Hardin wrote:
>> 
>> What would be helpful here would be logging of when a rule *starts*
>> evaluation. Normally that would be painful, but for tracking a runaway it
>> would be useful. Perhaps I can code up something to capture that and log it
>> on a timeout...
> 
> It already exists
> 
> spamassassin -D all,rules-all < msg
> 


Ah, that was useful.

Seeing:

...
Nov 15 16:09:40.479 [54834] dbg: check: uri_block_cidr list 
54.69.70.160-54.69.70.160 65.181.64.0-65.181.127.255 97.74.42.81-97.74.42.81 
98.124.199.1-98.124.199.1 167.114.67.88-167.114.67.88 
173.45.101.58-173.45.101.58 185.53.128.0-185.53.131.255 
192.31.186.4-192.31.186.4 192.99.0.0-192.99.255.255 
194.213.125.57-194.213.125.57
Nov 15 16:11:50.610 [54834] dbg: check: uri_local_bl http.sh addrs 
...
Nov 15 16:16:00.876 [54834] dbg: async: timing: 5.193 . 
dns:A:100.136.134.134.psbl.surriel.com
Nov 15 16:16:00.876 [54834] dbg: async: timing: 5.199 . 
dns:A:100.136.134.134.dnsbl.sorbs.net
Nov 15 16:16:00.876 [54834] dbg: async: timing: 385.726 X A:http.sh
Nov 15 16:16:00.876 [54834] dbg: async: timing: 385.726 X NS:http.sh
...


Why would resolving http.sh take this long?  And can we bring down the timeout?

Hard to imagine DNS requests taking more than a couple of seconds.

-Philip



Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-15 Thread Philip Prindeville



> On Nov 12, 2021, at 8:49 PM, John Hardin  wrote:
> 
> On Fri, 12 Nov 2021, Philip Prindeville wrote:
> 
>> I got the message, saved it to a flat file, and ran "spamassassin -t -D 
>> rules < netdev.eml" and saw:
>> 
>> ...
>> Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH_DOC 
>> ==> got hit (1)
>> ...
>> Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH 
>> ==> got hit (1)
>> Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in 
>> Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1, skipping 
>> further tests
>> ...
>> 
>> Am I correct that __ANY_TEXT_ATTACH alone took 4:30s?
> 
> "ran ... got hit" is past tense. And it needs to complete the rule to know 
> whether it got a hit.
> 
> 11:45:38.048 -> 11:45:38.063 = less than 20 msec.
> 
> The next rule, whatever that was, is the one that timed out after 4m20s.


Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.

Why can't I even find the rule?


> 
>> Could there be rules that *aren't* matching but are taking a while?
> 
> It's timing out on a rule that's running away. The timeout triggers before 
> "hit/no hit" is known.
> 
> What would be helpful here would be logging of when a rule *starts* 
> evaluation. Normally that would be painful, but for tracking a runaway it 
> would be useful. Perhaps I can code up something to capture that and log it 
> on a timeout...


Whenever a rule gets started, you could save the name and start time, and then 
burp that during timeout handling, right?


> 
> If you want to send me that message zipped up I can try it here with those 
> changes and see if it's a base rule running away.
> 


Sent out-of-band.

Doh.  Forgot to zip it.




Re: spam from gmail.com

2021-11-12 Thread Philip Prindeville



> On Nov 9, 2021, at 6:49 AM, Jared Hall  wrote:
> 
> On 11/8/2021 11:36 PM, Peter wrote:
>> It seems that people aren't taking google as seriously any more.
> First came Freemail.  Then came SpamAssassin.  I DO think that people take 
> Google seriously.  There are just so many ways to deal with this problem - 
> none of which is better than any other.
> 
> Google touts their AI capabilities with Spam.  Too bad they don't scan their 
> outbound email.  Instead, they seem to have adopted a cowardly philosophy 
> that an old C Telephone tech conveyed to me decades ago: "Problem's leaving 
> here fine!"
> 
> Google should practice what they preach:  SANITIZE USER INPUT. Instead, their 
> careless attitude presents a security threat to us all.
> 
> -- Jared Hall
> 


What... you mean "do no evil" is just lip-service?  I'm so... so... 
disillusioned!

-Philip



Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-12 Thread Philip Prindeville
Hi,

I got an email from net...@vger.kernel.org that was a lengthy (422K) regression 
test report from a patch someone had submitted.

I got the message, saved it to a flat file, and ran "spamassassin -t -D rules < 
netdev.eml" and saw:

...
Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH_DOC 
==> got hit (1)
...
Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH ==> 
got hit (1)
Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in 
Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1, skipping 
further tests
...

Am I correct that __ANY_TEXT_ATTACH alone took 4:30s? Looking at the rule, I 
don't understand why it's taking so long...  unless that's not the smoking gun. 
 Could there be rules that *aren't* matching but are taking a while?

72_active.cf:  mimeheader  __ANY_TEXT_ATTACH Content-Type =~ /text\/\w+/i

And how do I dig into why I'm getting that last message?

I can't even find type11_pri0_set1 as a string in 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/

Also, why are there multiple runs of:

Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.369 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.369 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.369 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.369 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"


Should this be capped to a maximum number of matches the way __HIGHBITS is?

And I'm not sure I want messages that haven't been fully scanned being 
delivered.  Should I crank TIME_LIMIT_EXCEEDED to 20.0?

Thanks,

-Philip



Re: Seeing "razor2 had unknown error during get_server_info"

2021-08-14 Thread Philip Prindeville
Asked and answered:

http://forum.centos-webpanel.com/index.php?topic=5505.0

Need to open outgoing port 2703 (TCP) for the mail server.


> On Aug 14, 2021, at 12:37 PM, Philip Prindeville 
>  wrote:
> 
> Hi all,
> 
> A few days ago, I started seeing this in my /var/log/maillog:
> 
> Aug 14 12:15:07 mail mimedefang-multiplexor[141367]: 17EIF11E226383: Worker 4 
> stderr: razor2: razor2 check failed: Connection refused razor2: razor2 had 
> unknown error during get_server_info at 
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/Razor2.pm line 188. at 
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/Razor2.pm line 331.
> 
> Around 2021-08-08.  Not sure what changed at that time.  I googled a bit and 
> found this from 2 years ago:
> 
> https://bugs.launchpad.net/ubuntu/+source/spamassassin/+bug/1819977
> 
> But I'm running Fedora 33 (updated).  Note Fedora still ships with 2.85 even 
> though 2.86 has been out more than 2 years and there's a request to update it:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1584474
> 
> I added to the end of my /etc/mail/spamassassin/local.cf
> 
> razor_config /etc/razor/razor-agent.conf
> 
> Which contains one line:
> 
> logfile none
> 
> Anyone else seeing a similar issue or know a fix?
> 
> Thanks,
> 
> -Philip
> 



Seeing "razor2 had unknown error during get_server_info"

2021-08-14 Thread Philip Prindeville
Hi all,

A few days ago, I started seeing this in my /var/log/maillog:

Aug 14 12:15:07 mail mimedefang-multiplexor[141367]: 17EIF11E226383: Worker 4 
stderr: razor2: razor2 check failed: Connection refused razor2: razor2 had 
unknown error during get_server_info at 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/Razor2.pm line 188. at 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/Razor2.pm line 331.

Around 2021-08-08.  Not sure what changed at that time.  I googled a bit and 
found this from 2 years ago:

https://bugs.launchpad.net/ubuntu/+source/spamassassin/+bug/1819977

But I'm running Fedora 33 (updated).  Note Fedora still ships with 2.85 even 
though 2.86 has been out more than 2 years and there's a request to update it:

https://bugzilla.redhat.com/show_bug.cgi?id=1584474

I added to the end of my /etc/mail/spamassassin/local.cf

razor_config /etc/razor/razor-agent.conf

Which contains one line:

logfile none

Anyone else seeing a similar issue or know a fix?

Thanks,

-Philip



Re: Apache SpamAssassin and Spammers 1st Amendment Rights

2020-11-26 Thread Philip Prindeville
Actually, the notion is much older than that… 12th or 13th century I believe.

Students of universities (like Oxford or Sorbonne or Geneve) would get 
together, interview professors, and pay them directly.

There was no “administration”.  The professors marketed their knowledge and 
insight directly to their would-be students.

Hence the “marketplace of ideas”.



> On Nov 21, 2020, at 2:17 PM, Steven Manross  wrote:
> 
> Long time lurker…  sometimes poster:
>  
> The marketplace of ideas is a century old concept that goes back to the days 
> of landmark U.S.S.C. First Amendment cases, and it is the “marketplace”’s 
> duty to weed out bad ideas (such like SpamAssassin is doing).
>  
> But again, SpamAssassin isn’t infringing on anyone’s 1st amendment rights.  
> It’s protecting people from speech they have already decided they have no 
> time to listen to (on purpose – without government involvement).
>  
> Okay…  back to lurking.



Re: Apache SpamAssassin and Spammers 1st Amendment Rights

2020-11-24 Thread Philip Prindeville
Free Speech doesn’t require anyone to pay for your soap box or megaphone.

But Spam is exactly that: having other people subsidize your speech through the 
theft of services.



> On Nov 19, 2020, at 2:25 PM, Kevin A. McGrail  wrote:
> 
> Afternoon Everyone,
> 
> So over the years, I have gotten a lot of complaints from spammers about how 
> I'm breaking their 1st amendment rights by blocking their spam as free 
> speech.  I've had to explain that I'm not the government and hence there are 
> no 1st amendment rights involved.
> 
> However, my friend, Steve Effros, just wrote a far more eloquent article 
> about it and I thought others on this list might appreciate it:
> 
> https://www.cablefax.com/regulation/first-things-first 
> 
> 
> Regards,
> 
> KAM
> 
> -- 
> Kevin A. McGrail
> kmcgr...@apache.org
> 
> Member, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> https://www.linkedin.com/in/kmcgrail - 703.798.0171
> 



Re: dbip-country-lite database

2020-11-19 Thread Philip Prindeville



> On Nov 15, 2020, at 11:48 AM, Dominic Raferd  wrote:
> 
> 
> 
> On Sun, 15 Nov 2020, 18:27 Philip Prindeville, 
>  wrote:
> Is anyone else using this database?
> 
> I’ve been using it with xt_geoip and Mimedefang and Plugin::URILocalBL to 
> block countries since Maxmind retired support for GeoIP on RHEL.
> 
> But I keep running into cases where parts of the database are very obviously 
> wrong.  It’s showing about 50% of 183.128.0.0-183.170.255.255 as being in the 
> US.  But APNIC says 183.128.0.0/11 is CHINANET.
> 
> Can you not use GeoIP2?


The licensing has changed.

Also, a lot of other packages (like xtables-addons) have pivoted away from the 
MaxMind database.

-Philip



dbip-country-lite database

2020-11-15 Thread Philip Prindeville
Is anyone else using this database?

I’ve been using it with xt_geoip and Mimedefang and Plugin::URILocalBL to block 
countries since Maxmind retired support for GeoIP on RHEL.

But I keep running into cases where parts of the database are very obviously 
wrong.  It’s showing about 50% of 183.128.0.0-183.170.255.255 as being in the 
US.  But APNIC says 183.128.0.0/11 is CHINANET.



Re: ANNOUNCEMENT: The NEW invaluement "Service Provider DNSBLs" - 1st one for Sendgrid-spams!

2020-08-21 Thread Philip Prindeville



> On Aug 21, 2020, at 1:28 PM, Rob McEwen  wrote:
> 
> ANNOUNCEMENT: The NEW invaluement "Service Provider DNSBLs" - 1st one for 
> Sendgrid-spams!
> 
> ...a collection of a new TYPE of DNSBL, with the FIRST of these having a 
> focus on Sendgrid-sent spams. AND - there is a FREE version of this - that 
> can be used NOW! (well... might need a SpamAssassin rule or two! Your help 
> appreciated!):
> 
> INFO AND INSTRUCTIONS HERE:
> 
> https://www.invaluement.com/serviceproviderdnsbl/
> 
> This provides a way to surgically block Sendgrid's WORST spammers, yet 
> without the massive collateral damage that would happen if blocking Sendgrid 
> domains and IP addresses. But we're NOT stopping at the phishes and viruses - 
> and we're not finished! There will be some well-deserved economic pain, that 
> puts the recipients' best interests at heart. Therefore, flagrant "cold 
> email" spamming to recipients who don't even know the sender - is also being 
> targeted - first with the absolute worst - and then progressing to other 
> offenders as we make adjustments in the coming weeks.
> 


I fail to see the point: that we do the work that sendgrid should be doing, but 
on a duplicative scale?

Why don’t they police themselves?

We’re effectively calling out spam that’s escaped after the fact.  What’s the 
point of that?

They should be scanning email as it leaves their infrastructure and using rules 
and Bayesian filters to know if something is amiss and they need to have human 
intervention.

Nothing is stopping them from doing the right thing.

Why should we enable their bad behavior?



Re: SendGrid (Was: Re: Freshdesk (again))

2020-08-17 Thread Philip Prindeville
I just add an extra 5.0 points for coming from Sendgrid now so it goes straight 
to the Junk folder.

Users can pull it out of there if they really want it.

Sendgrid is becoming to ASP’s what OVH and Softlayer are to ISP's.


> On Jun 27, 2020, at 3:56 AM, Niels Kobschätzki  wrote:
> 
> Sendgrid is such an origin for spam- and phishing-mails with certain terms 
> that I added extra meta-rules. From sendgrid and somewhere in the body is the 
> term “Amazon”? Here are your 10 points. 
> 
> Best,
> 
> Niels
> 
>> On 27. Jun 2020, at 11:32, Marc Roos  wrote:
>> 
>> 
>> 
>> I am going to make for companies like maildrop and sendgrid a hard block 
>> with reference to a page where someone can ask to be whitelisted with 
>> only an email address. In this procedure clearly stating the reason of 
>> the net block of these companies. If lots of sendgrid users are 
>> confronted with this, they will move to a better service. 
>> I can remember this fresh desk mail. I did not know where it came from. 
>> But now I know, I will complain a few million times.
>> 
>> 
>> 
>> 
>> -Original Message-
>> To: users@spamassassin.apache.org
>> Subject: SendGrid (Was: Re: Freshdesk (again))
>> 
>> Hello,
>> 
>>> On Fri, Jun 26, 2020 at 07:32:09PM -0600, Grant Taylor wrote:
>>> I've got to say, between NANOG, SDLU, and SpamAssassin, I see a LOT of 
>> 
>>> complaints about Sendgrid.
>> 
>> Also mailop. Have personally received phishing mails through SendGrid in 
>> the last 2 weeks in the name of citrix.com, microsoft.com and 
>> netflix.com. The Citrix one was to a hostmaster@ address. It's hard to 
>> comprehend how SendGrid could be doing a worse job of this, for so many 
>> months now.
>> 
>> Yet their list of legit clients is large, so they remain unblockable for 
>> me. I just wish those clients knew how little SendGrid would do to 
>> prevent their other customers sending out phishing emails in their name.
>> 
>> Cheers,
>> Andy
>> 
>> 
> 



Re: Freshdesk (again)

2020-08-17 Thread Philip Prindeville



> On Jul 7, 2020, at 3:16 AM, Raymond Dijkxhoorn  
> wrote:
> 
> Hai!
> 
>>>> it might help to add your complaint via ab...@sendgrid.com.
> 
>>> I very much doubt it. Sendgrid's business is sending mail and they do not 
>>> care if that mail is spam or not. If enough servers block them they will go 
>>> away.
>> 
>> They do, however, apparently care about phishing - they did disable the 
>> sendgrid redirect that some phisher has been spamming at me for the last 
>> three weeks.
> 
> They definately do. I report to them and they do take them down pretty 
> quickly.
> 
> Inside SURBL we do list the abused CT links. Unfortunately SA doesnt make use 
> of the wildcarded list that SURBL delivers for a long time now.
> 
> So if you want to use it add:
> 
> util_rb_3tldct.sendgrid.net
> 
> Inside your loca.cf
> 
> And while you are at it also add:
> 
> util_rb_2tldpage.link
> 
> Bye, Raymond


Hmmm… not my experience.

I’ve been calling out phishing from the same (IP) address for 10 days without 
any apparent (observable) action from Sendgrid.

At this point I’m wondering if they have compromised relays.

-Philip



Adding approximate matching (see also: another extortion email check)

2020-05-05 Thread Philip Prindeville
Hi,

I’ve recently gotten emails (a lot of them, as it happened) with the following 
subject line:

Subject: H¡gh level of r¡sk. Your account has been hacked. Change yøur passwørd.

and I’ve seen other similar emails in the past using simple mechanical 
substitutions (Greek alpha for ‘a’, Cyrillic a for ‘a’, Cyrillic A for ‘A’, 
Cyrillic VE for ‘B’, Cyrillic IE for ‘E’, Cyrillic EN for ‘H’, etc).

The String::Approx module (see https://metacpan.org/pod/String::Approx) allows 
for weighting insertions/deletions/substitutions, and what we’re seeing here is 
a heavy use of substitutions.

I’m thinking about a module where you could enter the ASCII string of:

High level of risk. Your account has been hacked. Change your password.

and all permutations of it via substitution would be matched as long as some 
threshold isn’t exceeded (say 10 or 15% substitutions, which seems like a 
reasonable ceiling).

There are also Spam I’ve seen where words have been deliberately misspelled as 
a way of avoiding exact matches, with doubled letters being dropped, similar 
letters being transposed (’n’ for ‘m’, ‘z’ for ’s’, ‘k’ for ‘c’, etc) so simply 
replacing non-ASCII letters with their ASCII “approximates” wouldn’t be 
sufficient because of the shuffling in the ASCII space as well.

Has anyone else considered approximate string matching?

Thanks,

-Philip





Re: Two types of new spam

2020-01-11 Thread Philip Prindeville



> On Jan 4, 2020, at 11:57 AM, Bill Cole 
>  wrote:
> 
> On 3 Jan 2020, at 17:45, Philip Prindeville wrote:
> [...]
> 
>> One other question that occurs to me: why would we even need > http-equiv=“Content-Type” …> if we already have a Content-Type: header?
> 
> There should be no need.
> 
> With that said, it could be *helpful* if a MUA were to save out the text/html 
> part as a standalone file without including any definitive indication of the 
> file being HTML.


Well, it turns out that a lot of MUA’s (including Apple’s Mail.app) generates 
this.


> 
>> Isn’t that the sign of a broken MUA doing the composition?
> 
> Not broken (except for the fact of generating HTML for email at all, a 
> disease analogous to HSV-1.) It is valid HTML and can be useful in rare 
> circumstances.
> 
>> Is that on its own Spamsign (with all respect to Frank Herbert)?
> 
> Do you consider all mail from Facebook to be spam?


Is that a trick question?

-Philip



Re: Two types of new spam

2020-01-03 Thread Philip Prindeville



> On Jan 3, 2020, at 3:45 PM, Philip Prindeville 
>  wrote:
> 
> 
> 
>> On Jan 2, 2020, at 4:08 PM, Philip Prindeville 
>>  wrote:
>> 
>> I’m getting the following Spam.
>> 
>> http://www.redfish-solutions.com/misc/bluechew.eml
>> 
>> And this is notable for having:
>> 
>> 
>> 
>> GUID1
>> GUID2
>> GUID3
>> GUID4
>> …
>> 
> 
> One other question that occurs to me: why would we even need  http-equiv=“Content-Type” …> if we already have a Content-Type: header?
> 
> Isn’t that the sign of a broken MUA doing the composition?  Is that on its 
> own Spamsign (with all respect to Frank Herbert)?
> 
> -Philip
> 


With that in mind, I’m trying out:

rawbody __L_UNNEEDED_META_CT/^\n\n([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{3}-[0-9a-f]{3}-[0-9a-f]{12}\n){10,1000}<\/style>\n/s

and couldn’t get that to match… not sure why.  A way to enable dumping what’s 
matched the pattern buf would be handy.

But this does match:

rawbody __L_STYLE_W_GUIDS 
m/

Re: Two types of new spam

2020-01-03 Thread Philip Prindeville



> On Jan 2, 2020, at 4:08 PM, Philip Prindeville 
>  wrote:
> 
> I’m getting the following Spam.
> 
> http://www.redfish-solutions.com/misc/bluechew.eml
> 
> And this is notable for having:
> 
> 
> 
> GUID1
> GUID2
> GUID3
> GUID4
> …
> 

One other question that occurs to me: why would we even need  if we already have a Content-Type: header?

Isn’t that the sign of a broken MUA doing the composition?  Is that on its own 
Spamsign (with all respect to Frank Herbert)?

-Philip



Re: Two types of new spam

2020-01-03 Thread Philip Prindeville



> On Jan 3, 2020, at 11:34 AM, RW  wrote:
> 
> On Fri, 3 Jan 2020 10:09:21 -0800 (PST)
> John Hardin wrote:
> 
>> On Fri, 3 Jan 2020, Pedro David Marco wrote:
>> 
>>> header __L_RECEIVED_SPFexists:Received-SPF
>>> tflags __L_RECEIVED_SPFmultiple maxhits=20
>>> 
>>> meta L_RECEIVED_SPF(__L_RECEIVED_SPF >= 10)
>>> describe L_RECEIVED_SPFCrazy numbers of Received-SFP headers
>>> score L_RECEIVED_SPF20.0
>>> 
>>> but it never seems to match.  
>> 
>> "exists" is a boolean, it's reasonable that it only returns one hit 
>> regardless of the number of instances present.
>> 
>> Try this instead, to actually match the header(s):
>> 
>>   header __L_RECEIVED_SPF   Received-SPF =~ /^./
> 
> That should be: 
> 
> header __L_RECEIVED_SPF   Received-SPF =~ /^./m


Seems to work either way!

Thanks, everyone.

-Philip



Two types of new spam

2020-01-02 Thread Philip Prindeville
I’m getting the following Spam.

http://www.redfish-solutions.com/misc/bluechew.eml

And this is notable for having:



GUID1
GUID2
GUID3
GUID4
…


so it should be easy enough to detect.

A GUID looks like:

[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{3}-[0-9a-f]{3}-[0-9a-f]{12}

The 2nd type of Spam I’m seeing looks like:

http://www.redfish-solutions.com/misc/received-spf.eml

which contains:

Received: from mta.amapspa.it ([127.0.0.1])
by localhost (mta.amapspa.it [127.0.0.1]) (amavisd-new, port 10026)
with ESMTP id U5M-E2lVwWem; Sat,  2 Nov 2019 00:19:36 +0100 (CET)
Received-SPF: none (amapspa.it: No applicable sender policy available) 
receiver=mta.amapspa.it; identity=mailfrom; 
envelope-from="dario.scarpu...@amapspa.it"; helo="[91.134.159.128]"; 
client-ip=91.134.159.128
Received-SPF: none (amapspa.it: No applicable sender policy available) 
receiver=mta.amapspa.it; identity=mailfrom; 
envelope-from="dario.scarpu...@amapspa.it"; helo="[91.134.159.128]"; 
client-ip=91.134.159.128
Received-SPF: none (amapspa.it: No applicable sender policy available) 
receiver=mta.amapspa.it; identity=mailfrom; 
envelope-from="dario.scarpu...@amapspa.it"; helo="[91.134.159.128]"; 
client-ip=91.134.159.128
…

with that line being repeated some 40 times, each line being identical.

I tried a rule like:

header __L_RECEIVED_SPF exists:Received-SPF
tflags __L_RECEIVED_SPF multiple maxhits=20

meta L_RECEIVED_SPF (__L_RECEIVED_SPF >= 10)
describe L_RECEIVED_SPF Crazy numbers of Received-SFP headers
score L_RECEIVED_SPF20.0


but it never seems to match.  I’ve not tried to debug this, but it seems that 
duplicated headers might not be saved as a list into the headers?  (Is there an 
easy way to see what exists:Received-SPF is evaluating as?)

If that’s the case, it would seem to be a shortcoming.

Can anyone confirm that’s indeed what’s happening?

Thanks,

-Philip



White listing this mailing list.

2019-12-18 Thread Philip
How do I white list this mailing list for some reason all the messages 
are now going to spam.





Re: HeaderEval::check_header_count_range() not working correctly?

2019-11-03 Thread Philip Prindeville
Sigh… “downside”.


> On Nov 3, 2019, at 2:32 PM, Philip Prindeville 
>  wrote:
> 
> What would be the downsize of having:
> 
>  my @hdrs = grep($uniq{$_}++, $pms->{msg}->get_header ($hdr));
> 
> instead and counting ALL instances of $hdr, not just the unique RHS’s?
> 
> 
> 
>> On Nov 3, 2019, at 1:51 PM, Philip Prindeville 
>>  wrote:
>> 
>> Hi.
>> 
>> I’m looking at:
>> 
>> # Return true if the count of $hdr headers are within the given range
>> sub check_header_count_range {
>> my ($self, $pms, $hdr, $min, $max) = @_;
>> my %uniq = ();
>> my @hdrs = grep(!$uniq{$_}++, $pms->{msg}->get_header ($hdr));
>> return (scalar @hdrs >= $min && scalar @hdrs <= $max);
>> }
>> 
>> in HeaderEval.pm and I’m not getting it.  It inserts (once and only once) 
>> each occurrence of a unique RHS for the header X.  So if I had:
>> 
>> X-yzzy: A
>> X-yzzy: A
>> X-yzzy: B
>> X-yzzy: C
>> 
>> Then $uniq{A} would be 2, $uniq${B} would be 1, and $uniq{C} would be 1, so 
>> the number of elements in @hdrs would be 3, once each for ‘A’, ‘B’, and ‘C’.
>> 
>> But if I have:
>> 
>> X-yzzy: A
>> X-yzzy: A
>> X-yzzy: A
>> X-yzzy: A
>> 
>> then $uniq{A} is 4, but the number of elements in @hdrs would be 1 (because 
>> of the ‘!’ which only passes the first).
>> 
>> This seems counter-intuitive.  What if I want to count the absolute number 
>> of headers of type ‘X-yzzy:’ regardless of their RHS?
>> 
>> I’ve been seeing a lot of Spam recently with duplicative Received-SPF: 
>> lines, but since they are all identical, it’s not nudging the number of 
>> @hdrs past one.
>> 
>> Thanks,
>> 
>> -Philip
>> 
> 



Re: HeaderEval::check_header_count_range() not working correctly?

2019-11-03 Thread Philip Prindeville
What would be the downsize of having:

  my @hdrs = grep($uniq{$_}++, $pms->{msg}->get_header ($hdr));

instead and counting ALL instances of $hdr, not just the unique RHS’s?



> On Nov 3, 2019, at 1:51 PM, Philip Prindeville 
>  wrote:
> 
> Hi.
> 
> I’m looking at:
> 
> # Return true if the count of $hdr headers are within the given range
> sub check_header_count_range {
> my ($self, $pms, $hdr, $min, $max) = @_;
> my %uniq = ();
> my @hdrs = grep(!$uniq{$_}++, $pms->{msg}->get_header ($hdr));
> return (scalar @hdrs >= $min && scalar @hdrs <= $max);
> }
> 
> in HeaderEval.pm and I’m not getting it.  It inserts (once and only once) 
> each occurrence of a unique RHS for the header X.  So if I had:
> 
> X-yzzy: A
> X-yzzy: A
> X-yzzy: B
> X-yzzy: C
> 
> Then $uniq{A} would be 2, $uniq${B} would be 1, and $uniq{C} would be 1, so 
> the number of elements in @hdrs would be 3, once each for ‘A’, ‘B’, and ‘C’.
> 
> But if I have:
> 
> X-yzzy: A
> X-yzzy: A
> X-yzzy: A
> X-yzzy: A
> 
> then $uniq{A} is 4, but the number of elements in @hdrs would be 1 (because 
> of the ‘!’ which only passes the first).
> 
> This seems counter-intuitive.  What if I want to count the absolute number of 
> headers of type ‘X-yzzy:’ regardless of their RHS?
> 
> I’ve been seeing a lot of Spam recently with duplicative Received-SPF: lines, 
> but since they are all identical, it’s not nudging the number of @hdrs past 
> one.
> 
> Thanks,
> 
> -Philip
> 



HeaderEval::check_header_count_range() not working correctly?

2019-11-03 Thread Philip Prindeville
Hi.

I’m looking at:

# Return true if the count of $hdr headers are within the given range
sub check_header_count_range {
 my ($self, $pms, $hdr, $min, $max) = @_;
 my %uniq = ();
 my @hdrs = grep(!$uniq{$_}++, $pms->{msg}->get_header ($hdr));
 return (scalar @hdrs >= $min && scalar @hdrs <= $max);
}

in HeaderEval.pm and I’m not getting it.  It inserts (once and only once) each 
occurrence of a unique RHS for the header X.  So if I had:

X-yzzy: A
X-yzzy: A
X-yzzy: B
X-yzzy: C

Then $uniq{A} would be 2, $uniq${B} would be 1, and $uniq{C} would be 1, so the 
number of elements in @hdrs would be 3, once each for ‘A’, ‘B’, and ‘C’.

But if I have:

X-yzzy: A
X-yzzy: A
X-yzzy: A
X-yzzy: A

then $uniq{A} is 4, but the number of elements in @hdrs would be 1 (because of 
the ‘!’ which only passes the first).

This seems counter-intuitive.  What if I want to count the absolute number of 
headers of type ‘X-yzzy:’ regardless of their RHS?

I’ve been seeing a lot of Spam recently with duplicative Received-SPF: lines, 
but since they are all identical, it’s not nudging the number of @hdrs past one.

Thanks,

-Philip



Rule for detecting two email addresses in From: field.

2019-10-03 Thread Philip

Morning List,

Lately I'm getting a bunch of emails that are showing up with two email 
addresses in the From: field.


From: "Persons Name " 

When you look in your mail client (Outlook, Thunderbird) it's showing 
only "Persons Name "


Is there a way I can mark From: that has 2 email addresses in it as 
spam? Pro's Cons?


Phil


OT: Issues w/ hughes.net not accepting messages?

2019-03-03 Thread Philip Prindeville
Has anyone else started seeing something similar in the last 2-3 weeks?


Running /var/spool/mqueue/x22LrU1S006228 (sequence 1 of 2)
... Connecting to mx.hughes.net. via esmtp...
220 mx.hughes.net ESMTP
>>> EHLO mail.redfish-solutions.com
250-mx01.hughes.cmh.synacor.com says EHLO to 66.232.79.143:36344
250-PIPELINING
250-8BITMIME
250-XDUMPCONTEXT
250 ENHANCEDSTATUSCODES
>>> MAIL From:
451 4.7.1 66.232.79.143 You have exceeded your messaging limit.  Please try 
again later.
... Deferred: 451 4.7.1 66.232.79.143 You have exceeded your 
messaging limit.  Please try again later.


There’s no way to signal to hughes.net that this is happening since you have to 
be a user to report a problem via the webforms or telephone.

I obviously can’t email support or postmaster because that gets rejected too.

It says “Please try again later” but never accepts a message and after 120 
hours, the message is tossed without ever having been accepted.

It doesn’t give you a link explaining what’s happened or why or a way to appeal 
it.

And apparently 2-3 messages a week is “exceeding your messaging limit”.  WTF?

If they’re greylisting, it’s not being done correctly.

If they’re blacklisting, they should say so with a 5xx response and explain why 
(reputation, RBL, DKIM, SPF, etc).

Apparently operating email *is* more complicated than rocket science, because 
they can operate a satellite but can’t correctly configure a mail server.

I noticed that before this happened, smtp.hughes.net used to receive email 
(i.e. be their MXer), then it got switched to mx.hughes.net and this started 
happening.

If anyone is a hughes.net user and wants to call out this issue, I’d appreciate 
it.

Thanks,

-Philip



check_header_count_range() for MIME sections?

2018-10-29 Thread Philip Prindeville
Hi.

I’d like to be able to detect duplicated header types in MIME sections.

I think you all have been seeing them too.  Is there an easy way to see if a 
message contains any MIME sections where particular headers occur more than 
once?

Thanks,

-Philip



How to text that TxRep is working?

2018-05-22 Thread Philip
I've added TxRep to spamassassin and set in my local.cf. Following the 
instructions:


http://truxoft.com/resources/txrep.htm

# TXTREP
use_txrep 1

Is there a way to test that it's actually working?

Phil




Re: Spammers, IPv6 addresses, and dnsbls

2018-03-07 Thread Philip

Hi there,

Providers like Linode assign a single IPv6 address from a /64. I had to 
request my own block of /64 to use on my server as my IP neighbors were 
always getting the /64 blocked... since I've had my own I've been all 
good.  Before this my IPv6 IP was getting blocked daily because of 
someone else on that /64.  It was quite annoying for myself.


Phil

ps your server blocks .nz domains :P

On 03/03/2018 00:54, Daniele Duca wrote:

Hello list,

apologies if this is not directly SA related. "Lately" I've started to 
notice that some (not saying names) VPS providers, when offering v6 
connectivity, sometimes tends to not follow the best practice of 
giving a /64 to their customer, routing to them much smaller v6 
subnets, while still giving to them the usual /30 or /29 v4 subnets.


What It's happening is that whenever a spammer buys a VPS with those 
providers and get blacklisted, most of the time the dnsbls list the 
whole v6 /64, while still listing only the single ipv4 address. This 
makes some senses, as it would be enormously resource intensive to 
track each of the 18,446,744,073,709,551,616 addresses in the /64, but 
unfortunately not respecting basic v6 subnetting rules causes 
reputation problems also for the other customers that have the bad 
luck of living in the same /64 and are using their VPS as an outgoing 
mail server.


While I'm not judging the reasons why VPS providers are doing this 
type of useless v6 subnetting (micronetting?), I've started to deploy 
some countermeasures to avoid FPs. Specifically I wrote a rule that 
identifies if the last untrusted relay is a v6 address, and then is 
subsequently used in other meta rules that subtract some points in 
dnsbl tests that check the -lastexternal ip address on v6-aware lists.


I know that probably is not the best solution, but I've started to see 
real FPs that worried me. I've even pondered if it could have sense to 
go back to v4 only connectivity for my inbound mtas.


If you are in a similar situation I would like very much to discuss 
what would be the best approach to balance spam detection while 
avoiding fps


Regards

Daniele Duca






Loading custom rules.

2018-02-25 Thread Philip
How do you load custom rules... is it as simple as dropping the .cf file 
in the spamassassin directory and restart?


I'm looking at these: https://wiki.apache.org/spamassassin/CustomRulesets

Phil


Tone of emails with subject: 'hey'

2018-02-05 Thread Philip
So lately I'm getting LOTS of emails coming directly though the filters 
so most likely time to investigate how to create one.


The subject is always 'hey'

Subject: hey

Date: Mon, 29 Jan 2018 09:07:40 +0300
From: Darya Message-ID: <8f35b00fb4e07d18ce82448ec9747...@112it4u.ro>
X-Mailer: PHPMailer 5.2.22 (https://github.com/PHPMailer/PHPMailer)
MIME-Version: 1.0
Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

Hi josh, my name is Darya and i'm from Russia, but living in the USA. A 
week ago, maybe more, I came across your profile on Facebook and now I 
wan to know you more. I know it sounds a bit strange, but I believe you 
had something like this in your life too :-) If its mutual, email me, 
this is my email danielamar...@rambler.ru and I will send some of my 
photos also answer any of your questions. Waiting for you, XXX Darya


As far as I can see from the different emails:

X-PHP-Originating-Script: 852:class-phpmailer.php

The number is sequential.

112it4u.ro from the message ID has valid NS entries but the reverse PTR 
is invalid.


The email always starts, 'hi {mailbox name}, and the text is mostly the 
same but the name changes now and then and so does the email address.


Any suggestions on where to start? nOOb here!

Phil




Re: Email address as fullname in To: field

2017-11-10 Thread Philip Prindeville
Anyone have an idea what software first started doing this?  Or have any idea 
why ANYONE would think this was a good idea?

The weird part is that I’m seeing machine generated email where I *know* they 
have my full name and they could be using that to synthesize the email but they 
don’t.  Or, conversely, they could simply not put any full name field in at all 
and just use the raw email address…

It’s like someone made the conscious decision to choose the worst of both 
worlds…



> On Jul 13, 2017, at 11:49 AM, Philip Prindeville 
> <philipp_s...@redfish-solutions.com> wrote:
> 
> I’m getting more and more email as:
> 
> To: “joeb...@example.com” <joeb...@example.com>
> 
> anyone know why there’s an increase in this?  Did Exchange recently get 
> broken so that it’s not populating the Addressbook properly?
> 
> I noticed that even legitimate promotional mailers (like 1800petmeds.com) are 
> doing this… Even though when you sign up, you give them your full name.
> 
> It used to be a reliable trigger for spam (because a lot of harvesters ONLY 
> deal in email addresses and don’t bother collecting the fullnames, since they 
> mechanically scrape websites, without knowing how to parse anything but email 
> addresses themselves—and sometimes not even those correctly, since I’ll see 
> Spam addresses to Message-Id: values, References: values, etc.
> 
> Thanks,
> 
> -Philip
> 



Email address as fullname in To: field

2017-07-13 Thread Philip Prindeville
I’m getting more and more email as:

To: “joeb...@example.com” <joeb...@example.com>

anyone know why there’s an increase in this?  Did Exchange recently get broken 
so that it’s not populating the Addressbook properly?

I noticed that even legitimate promotional mailers (like 1800petmeds.com) are 
doing this… Even though when you sign up, you give them your full name.

It used to be a reliable trigger for spam (because a lot of harvesters ONLY 
deal in email addresses and don’t bother collecting the fullnames, since they 
mechanically scrape websites, without knowing how to parse anything but email 
addresses themselves—and sometimes not even those correctly, since I’ll see 
Spam addresses to Message-Id: values, References: values, etc.

Thanks,

-Philip



Re: Relitigating TB's behavior because of "villainous" SpamAssassin... hiss!

2017-02-12 Thread Philip Prindeville

> On Feb 12, 2017, at 4:53 PM, Philip Prindeville 
> <philipp_s...@redfish-solutions.com> wrote:
> 
> What an incredible waste of time:
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=417942#c19
> 
> I actually think I might be dialoging with a highly argumentative variant of 
> Eliza.
> 
> In which case, it’s passed the Turing Test.
> 

The guy is all sorts of misguided and was upset that Hotmail’s SpamAssassin 
scored his message low because “Thunderbird [sic] and added a Received: line 
with his address 192.168.x.x” (which he’s super paranoid apparently about 
divulging… because apparently no one else in the world is using that same 
192.168.x.x address).

Go figure.

Anyway, let it be a cautionary tale about what sort of rathole to not let 
yourself get sucked into.



Relitigating TB's behavior because of "villainous" SpamAssassin... hiss!

2017-02-12 Thread Philip Prindeville
What an incredible waste of time:

https://bugzilla.mozilla.org/show_bug.cgi?id=417942#c19

I actually think I might be dialoging with a highly argumentative variant of 
Eliza.

In which case, it’s passed the Turing Test.



Re: RFC compliance pedantry (was Re: New type of monstrosity)

2017-02-08 Thread Philip Prindeville
Having been through the process of authoring 2 RFC’s, perhaps I can shed some 
light on the process for you.

All proposed standards started life as draft RFC’s (this was before the days of 
IDEA’s but after the days of IEN’s).

If it were validated by the working group and passed up to the IAB and they 
concurred (they usually deferred to the WG except on editorial matters), then 
the proposed draft was issued officially as an RFC and given a number.

Later, after it accepted wide enough adoption in the Internet community, an 
existing RFC might be promoted to “standard” from “experimental”, etc.

Occasionally, if a WG (working group) did enough reference implementations and 
proved them at one or more interoperability meetings (the so-called 
“bake-offs”), then the WG could petition for immediate labeling as a “standard” 
when the RFC was approved by the IAB.

It’s even possible for a standard (like RFC-1035) to have both “standard” parts 
(like A RR’s) and “experimental” parts (like MB RR’s).


> On Feb 8, 2017, at 7:04 AM, Ruga  wrote:
> 
> Read the headers of RFCs; some o them are explicitly  labeled as standard. 
> Most of them are request for comments. 
> 
> 
> On Wed, Feb 8, 2017 at 2:58 PM, Kevin A. McGrail <'kmcgr...@pccc.com'> wrote:
>> On 2/8/2017 8:52 AM, Ruga wrote: 
>> > Not all RFCs are standards. 
>> > Educate yourself. 
>> The personal attacks aren't necessary. These RFCs are the basis for 
>> effectively 100% of the email on the planet for decades. If that's not 
>> a standard, what is? 



Re: Uninitialized values in URIDNSBL

2017-02-08 Thread Philip Prindeville

> On Feb 3, 2017, at 6:04 PM, Kevin A. McGrail <kmcgr...@pccc.com> wrote:
> 
> Re: 3.4.2 SA release
> 
> Imminent.  I'd like to start a push for a release, prioritizing bugs, etc.
> 
> I've stepped up to be the Release Manager and I'm coordinating things at work 
> so I can dedicated time to the process.
> 
> Regards,
> KAM

Good to hear.

While we’re waiting for that, can I just grab Util.pm and Plugin/URIDNSBL.pm 
out of trunk, or are there more dependencies than that to splice the fix back 
into 3.4.1?

Thanks,

-Philip



Re: Uninitialized values in URIDNSBL

2017-02-03 Thread Philip Prindeville

> On Feb 2, 2017, at 5:06 PM, Reindl Harald <h.rei...@thelounge.net> wrote:
> 
> 
> 
> Am 02.02.2017 um 23:41 schrieb Martin Gregorie:
>> On Thu, 2017-02-02 at 15:23 -0700, Philip Prindeville wrote:
>>> Anyone else seeing this?
>>> 
>> Yes - in Fedora 25
> 
> that problem is much much older than F25
> 
> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7339
> https://lists.gt.net/spamassassin/users/201851
> 
> As you've discovered, this was fixed on trunk 2015-06-10
> 
> yeah how god that there are SA updates regulary - NOT


Yeah, I was wondering about that myself… when is the next SA release due out?  
And what’s the release criteria?

-Philip



Uninitialized values in URIDNSBL

2017-02-02 Thread Philip Prindeville
Anyone else seeing this?

Feb  2 08:10:23 mail mimedefang.pl[13017]: helo: mailman2.scl3.mozilla.com 
(63.245.214.181:3844) said "helo mail.mozilla.org"
Feb  2 08:10:23 mail sendmail[14852]: v12FAHm7014852: 
from=<general-bounces+philipp_subx=redfisholutions@lists.mozilla.org>, 
size=4727, class=-30, nrcpts=1, 
msgid=<0oudnazy4jgf1g7fnz2dnuu7-qmdn...@mozilla.org>, bodytype=7BIT, 
proto=ESMTPS, daemon=MTA-v4, relay=mailman2.scl3.mozilla.com [63.245.214.181]
Feb  2 08:10:23 mail mimedefang-multiplexor[2048]: v12FAHm7014852: Slave 2 
stderr: Use of uninitialized value $4 in concatenation (.) or string at 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 1042.
Feb  2 08:10:23 mail mimedefang-multiplexor[2048]: v12FAHm7014852: Slave 2 
stderr: Use of uninitialized value $3 in concatenation (.) or string at 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 1042.
Feb  2 08:10:23 mail mimedefang-multiplexor[2048]: v12FAHm7014852: Slave 2 
stderr: Use of uninitialized value $2 in concatenation (.) or string at 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 1042.
Feb  2 08:10:23 mail mimedefang-multiplexor[2048]: v12FAHm7014852: Slave 2 
stderr: dns: new_dns_packet (domain=...(.sbl.spamhaus.org. type=A class=IN) 
failed: a domain name contains a null label
Feb  2 08:10:23 mail mimedefang-multiplexor[2048]: v12FAHm7014852: Slave 2 
stderr: dns: new_dns_packet (domain=...(.zen.spamhaus.org. type=A class=IN) 
failed: a domain name contains a null label
Feb  2 08:10:23 mail mimedefang-multiplexor[2048]: v12FAHm7014852: Slave 2 
stderr: Use of uninitialized value $4 in concatenation (.) or string at 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 1042.
Feb  2 08:10:23 mail mimedefang-multiplexor[2048]: v12FAHm7014852: Slave 2 
stderr: Use of uninitialized value $3 in concatenation (.) or string at 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 1042.
Feb  2 08:10:23 mail mimedefang-multiplexor[2048]: v12FAHm7014852: Slave 2 
stderr: Use of uninitialized value $2 in concatenation (.) or string at 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 1042.
Feb  2 08:10:23 mail mimedefang-multiplexor[2048]: v12FAHm7014852: Slave 2 
stderr: Use of uninitialized value $4 in concatenation (.) or string at 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 1042.
Feb  2 08:10:23 mail mimedefang-multiplexor[2048]: v12FAHm7014852: Slave 2 
stderr: Use of uninitialized value $3 in concatenation (.) or string at 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 1042.
Feb  2 08:10:23 mail mimedefang-multiplexor[2048]: v12FAHm7014852: Slave 2 
stderr: Use of uninitialized value $2 in concatenation (.) or string at 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 1042.


I’m seeing these right after upgrading from Fedora 23 (EOL) to Fedora 24 so 
evidently a bunch of files got updated…

-Philip




How to know if TxRep is white listing out going email.

2016-03-29 Thread Philip
I've enabled outgoing white listing using the TxRep plugin is there a 
way to find out if outbound emails are actually being white listed? A 
log somewhere... a file being updated?


--
Phil


Help understanding TxRep errors.

2016-03-15 Thread Philip

After turning on TxRep I get these lines in my /var/log/spamd.log file.

Wed Mar 16 08:21:55 2016 [16629] warn: Use of uninitialized value 
$msgscore in addition (+) at /etc/spamassassin/TxRep.pm line 1414.
Wed Mar 16 08:21:55 2016 [16629] warn: Use of uninitialized value 
$msgscore in subtraction (-) at /etc/spamassassin/TxRep.pm line 1414.


/etc/spamassassin/60_txreputation.cf has...

use_txrep 1

header TXREP   eval:check_senders_reputation()
describe   TXREP   Score normalizing based on sender's reputation
tflags TXREP   userconf noautolearn
priority   TXREP   1000

txrep_whitelist_out 1

Ideas, suggestions?

Regards,

Phil


Re: Omitting leading whitespace on headers?

2015-12-29 Thread Philip Prindeville

On Dec 29, 2015, at 2:14 PM, Kevin A. McGrail <kmcgr...@pccc.com> wrote:

> On 12/29/2015 3:46 PM, Philip Prindeville wrote:
>> On Dec 29, 2015, at 1:42 PM, Kevin A. McGrail <kmcgr...@pccc.com> wrote:
>> 
>>> On 12/29/2015 3:38 PM, Philip Prindeville wrote:
>>>> Is there a reason that headers are left with leading spaces?
>>>> 
>>>> I’ve noticed that I have to write rules as:
>>>> 
>>>> Subject =~ /^ Great [Jj]ob [Oo]pportunity/
>>>> 
>>>> because of the leading space…
>>> I'm at a complete loss.  I add plenty of Subject rules with no leading 
>>> space.  Never seen this issue.
>> 
>> I had some rules which weren’t firing so I had to change to /^ .../ or else 
>> /^ ?.../ to make them match.
>> 
>> Not sure why.
>> 
>> This is with SA 3.4.1 on Fedora 21.
>> 
>> -Philip
> What's the original Subject header look like from the original mail?
> 
> Regards,
> KAM


This was a while ago.  I’d have to go back and look.  Maybe this one?

Subject: [IDN][#2056301] CareerBuilder: Open position for you





Re: Omitting leading whitespace on headers?

2015-12-29 Thread Philip Prindeville

On Dec 29, 2015, at 1:42 PM, Kevin A. McGrail <kmcgr...@pccc.com> wrote:

> On 12/29/2015 3:38 PM, Philip Prindeville wrote:
>> Is there a reason that headers are left with leading spaces?
>> 
>> I’ve noticed that I have to write rules as:
>> 
>> Subject =~ /^ Great [Jj]ob [Oo]pportunity/
>> 
>> because of the leading space…
> I'm at a complete loss.  I add plenty of Subject rules with no leading space. 
>  Never seen this issue.


I had some rules which weren’t firing so I had to change to /^ .../ or else /^ 
?.../ to make them match.

Not sure why.

This is with SA 3.4.1 on Fedora 21.

-Philip



Re: Omitting leading whitespace on headers?

2015-12-29 Thread Philip Prindeville

On Dec 29, 2015, at 2:39 PM, Kevin A. McGrail <kmcgr...@pccc.com> wrote:

> On 12/29/2015 4:29 PM, Philip Prindeville wrote:
>> On Dec 29, 2015, at 2:14 PM, Kevin A. McGrail <kmcgr...@pccc.com> wrote:
>> 
>>> On 12/29/2015 3:46 PM, Philip Prindeville wrote:
>>>> On Dec 29, 2015, at 1:42 PM, Kevin A. McGrail <kmcgr...@pccc.com> wrote:
>>>> 
>>>>> On 12/29/2015 3:38 PM, Philip Prindeville wrote:
>>>>>> Is there a reason that headers are left with leading spaces?
>>>>>> 
>>>>>> I’ve noticed that I have to write rules as:
>>>>>> 
>>>>>> Subject =~ /^ Great [Jj]ob [Oo]pportunity/
>>>>>> 
>>>>>> because of the leading space…
>>>>> I'm at a complete loss.  I add plenty of Subject rules with no leading 
>>>>> space.  Never seen this issue.
>>>> I had some rules which weren’t firing so I had to change to /^ .../ or 
>>>> else /^ ?.../ to make them match.
>>>> 
>>>> Not sure why.
>>>> 
>>>> This is with SA 3.4.1 on Fedora 21.
>>>> 
>>>> -Philip
>>> What's the original Subject header look like from the original mail?
>>> 
>>> Regards,
>>> KAM
>> 
>> This was a while ago.  I’d have to go back and look.  Maybe this one?
>> 
>> Subject: [IDN][#2056301] CareerBuilder: Open position for you
> OK, I was thinking perhaps an alternate charset or something but never run 
> into this issue.
> 
> If you are anchoring your Subject searches, allowing for whitespace, etc. is 
> a decent idea though from Reindl.
> 
> regards,
> KAM


I did recall that I used the patch here:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6360#c4

to be able to debug my rules, using a rule that would match any non-empty 
subject: value to dump out what it was (the “> got hit: “…”” line), and it 
was always showing a leading space…

-Philip



Re: Omitting leading whitespace on headers?

2015-12-29 Thread Philip Prindeville

On Dec 29, 2015, at 3:15 PM, Kevin A. McGrail <kmcgr...@pccc.com> wrote:

> On 12/29/2015 5:12 PM, Philip Prindeville wrote:
>> I did recall that I used the patch here:
>> 
>> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6360#c4
>> 
>> to be able to debug my rules, using a rule that would match any non-empty 
>> subject: value to dump out what it was (the “> got hit: “…”” line), and 
>> it was always showing a leading space…
>> 
>> -Philip
>> 
> 
> Thank goodness.  You had me worried we have something foundational processing 
> issue!
> 
> Regards,
> KAM
> 

No, I eventually added “^ ?” to all of my Subject rules… but I’m thinking I 
shouldn’t have had to.



Omitting leading whitespace on headers?

2015-12-29 Thread Philip Prindeville
Is there a reason that headers are left with leading spaces?

I’ve noticed that I have to write rules as:

Subject =~ /^ Great [Jj]ob [Oo]pportunity/

because of the leading space… Given the text of RFC-2822:

NO-WS-CTL   =   %d1-8 / ; US-ASCII control characters
%d11 /  ;  that do not include the
%d12 /  ;  carriage return, line feed,
%d14-31 /   ;  and white space characters
%d127

text=   %d1-9 / ; Characters excluding CR and LF
%d11 /
%d12 /
%d14-127 /
obs-text

FWS =   ([*WSP CRLF] 1*WSP) /   ; Folding white space
obs-FWS

obs-FWS =   1*WSP *(CRLF 1*WSP)

utext   =   NO-WS-CTL / ; Non white space controls
%d33-126 /  ; The rest of US-ASCII
obs-utext

unstructured=   *([FWS] utext) [FWS]

subject =   "Subject:" unstructured CRLF


Might we consider dropping the first instance of “FWS” preceding the first 
instance of “utext” in “unstructured”?

-Philip



signature.asc
Description: Message signed with OpenPGP using GPGMail


Omitting leading whitespace on headers?

2015-12-29 Thread Philip Prindeville
Is there a reason that headers are left with leading spaces?

I’ve noticed that I have to write rules as:

Subject =~ /^ Great [Jj]ob [Oo]pportunity/

because of the leading space… Given the text of RFC-2822:

NO-WS-CTL   =   %d1-8 / ; US-ASCII control characters
   %d11 /  ;  that do not include the
   %d12 /  ;  carriage return, line feed,
   %d14-31 /   ;  and white space characters
   %d127

text=   %d1-9 / ; Characters excluding CR and LF
   %d11 /
   %d12 /
   %d14-127 /
   obs-text

FWS =   ([*WSP CRLF] 1*WSP) /   ; Folding white space
   obs-FWS

obs-FWS =   1*WSP *(CRLF 1*WSP)

utext   =   NO-WS-CTL / ; Non white space controls
   %d33-126 /  ; The rest of US-ASCII
   obs-utext

unstructured=   *([FWS] utext) [FWS]

subject =   "Subject:" unstructured CRLF


Might we consider dropping the first instance of “FWS” preceding the first 
instance of “utext” in “unstructured”?

-Philip



Re: any reason not to block every Softlayer allocation?

2015-10-06 Thread Philip Prindeville

On Oct 5, 2015, at 10:57 PM, Noel Butler <noel.but...@ausics.net> wrote:

> On 06/10/2015 12:39, Jo Rhett wrote:
> 
>> Sorry, let me restate: I know consequences of blocking large
>> providers. I’m asking if others have found the same to be true, or if
>> there is any reason to give SoftLayer benefit of the doubt?
>> Once in a great while this kind of query generates clueful contact
>> with said provider to get off their tail...
> 
> 
> softlayer is turning into the U.S.'s version of Europe's OVH - many ranges of 
> both are blocked, though the report rate has dropped significantly in months 
> gone by for both, so if you block, leave yourself a note to unblock in 30 
> days or so and see how it pans out.
> 
> Alternatively, if you have a lot of users you provide for that gets legit 
> softlayer mail, just score them high so they always end up in spam folder.


We’ve had issues with softlayer/the planet.  I don’t remember ever seeing a 
response to a single complaint.  Not one.

And some of them are really blatant, like impersonating the FBI.

On thing I’ve noticed is that long-term, legitimate softlayer customers end up 
changing their rDNS (PTR) records, since they don’t have to jump from lily pad 
to lily pad.

The spammers, on the other hand, often don’t go through the trouble because 
they’re not going to be there long enough.

In that case, blocking something like:

X-Spam-Relays-Untrusted =~ /^[^\]]+ 
rdns=\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}-static.reverse.softlayer.com /
X-Spam-Relays-Untrusted =~ /^[^\]]+ 
rdns=\[0-9a-f]{2}\.[0-9a-f]{2}\.[0-9a-f]{4}\.static\.theplanet\.com /


might be the solution.

We found that most of the spam we got from softlayer either included a URL that 
resolved to 104.148.103.2 — which was easy to block with check_url_local_bl() — 
or else contained a message-id which had an email address in it followed by:

[a-z0-9\-\.]{1,6}>$

for instance.

-Philip



Re: tflags multiple and header exists:

2015-09-29 Thread Philip Prindeville

On Sep 29, 2015, at 10:09 AM, Philip Prindeville 
<philipp_s...@redfish-solutions.com> wrote:

> Can you use something like:
> 
> header __L_X_NO_RELAY exists:X-No-Relay
> tflags __L_X_NO_RELAY multiple

Actually, that should probably be bounded to something like:

tflags __L_X_NO_RELAY   multiple maxhits=10


> 
> meta MULTIPLE_X_NO_RELAY  __L_X_NO_RELAY >= 8
> describe MULTIPLE_X_NO_RELAY  Saw an inordinate number of X-No-Relay: headers
> score MULTIPLE_X_NO_RELAY 10.0
> 
> I couldn’t get the first 2 lines to work together.  I had to resort to:
> 
> header __L_X_NO_RELAY ALL =~ /^x-no-relay:/msi
> 
> instead for the first line.  Is this a known constraint?
> 
> -Philip
> 



tflags multiple and header exists:

2015-09-29 Thread Philip Prindeville
Can you use something like:

header __L_X_NO_RELAY   exists:X-No-Relay
tflags __L_X_NO_RELAY   multiple

meta MULTIPLE_X_NO_RELAY__L_X_NO_RELAY >= 8
describe MULTIPLE_X_NO_RELAYSaw an inordinate number of X-No-Relay: headers
score MULTIPLE_X_NO_RELAY   10.0

I couldn’t get the first 2 lines to work together.  I had to resort to:

header __L_X_NO_RELAY   ALL =~ /^x-no-relay:/msi

instead for the first line.  Is this a known constraint?

-Philip



Re: tflags multiple and header exists:

2015-09-29 Thread Philip Prindeville

On Sep 29, 2015, at 10:44 AM, John Hardin <jhar...@impsec.org> wrote:

> On Tue, 29 Sep 2015, Philip Prindeville wrote:
> 
>> Can you use something like:
>> 
>> header __L_X_NO_RELAYexists:X-No-Relay
> 
> Are you seeing empty X-No-Relay headers? How about:

No, not empty.  Typically they say:

X-No-Relay: not in my network


> 
>  header__HAS_NO_RELAYX-No-Relay =~ /./
> 
> ...which is in my sandbox, but just for eval, it's not scored yet:


No, that ends up matching once per character…  But /.*/ works.


> 
> http://ruleqa.spamassassin.org/20150926-r1705400-n/__HAS_NO_RELAY/detail
> 
>> tflags __L_X_NO_RELAYmultiple
>> 
>> meta MULTIPLE_X_NO_RELAY __L_X_NO_RELAY >= 8
> 
> If you're doing that, do TFLAGS multiple, maxhits=9
> 
> I'll add this to my sandbox.
> 
> -- 
> John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
> jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
> key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79



The word on messages w/ no Message-Id

2015-09-28 Thread Philip Prindeville
D:" fields.  If there is no "Message-ID:" field in
   any of the parent messages, then the new message will have no "In-
   Reply-To:" field.

   The "References:" field will contain the contents of the parent's
   "References:" field (if any) followed by the contents of the parent's
   "Message-ID:" field (if any).  If the parent message does not contain
   a "References:" field but does have an "In-Reply-To:" field
   containing a single message identifier, then the "References:" field
   will contain the contents of the parent's "In-Reply-To:" field
   followed by the contents of the parent's "Message-ID:" field (if
   any).  If the parent has none of the "References:", "In-Reply-To:",
   or "Message-ID:" fields, then the new message will have no
   "References:" field.





Resnick Standards Track[Page 26]
RFC 5322Internet Message Format October 2008


  Note: Some implementations parse the "References:" field to
  display the "thread of the discussion".  These implementations
  assume that each new message is a reply to a single parent and
  hence that they can walk backwards through the "References:" field
  to find the parent of each message listed there.  Therefore,
  trying to form a "References:" field for a reply that has multiple
  parents is discouraged; how to do so is not defined in this
  document.

   The message identifier (msg-id) itself MUST be a globally unique
   identifier for a message.  The generator of the message identifier
   MUST guarantee that the msg-id is unique.  There are several
   algorithms that can be used to accomplish this.  Since the msg-id has
   a similar syntax to addr-spec (identical except that quoted strings,
   comments, and folding white space are not allowed), a good method is
   to put the domain name (or a domain literal IP address) of the host
   on which the message identifier was created on the right-hand side of
   the "@" (since domain names and IP addresses are normally unique),
   and put a combination of the current absolute date and time along
   with some other currently unique (perhaps sequential) identifier
   available on the system (for example, a process id number) on the
   left-hand side.  Though other algorithms will work, it is RECOMMENDED
   that the right-hand side contain some domain identifier (either of
   the host itself or otherwise) such that the generator of the message
   identifier can guarantee the uniqueness of the left-hand side within
   the scope of that domain.

   Semantically, the angle bracket characters are not part of the
   msg-id; the msg-id is what is contained between the two angle bracket
   characters.


Extracting the operative text: "The "Message-ID:" field provides a unique 
message identifier that refers to a particular version of a particular message. 
 The uniqueness of the message identifier is guaranteed by the host that 
generates it […]. The message identifier (msg-id) itself MUST be a globally 
unique identifier for a message.”

Obviously a missing Message-ID is hardly unique, and hence this requirement is 
not being fulfilled.

Does this warrant scoring the message severely?

I say “yes”.

Anyone else?

-Philip



Re: Test for empty EnvelopeFrom

2015-09-24 Thread Philip Prindeville

On Sep 24, 2015, at 4:12 AM, Reindl Harald <h.rei...@thelounge.net> wrote:

> 
> 
> Am 23.09.2015 um 19:24 schrieb Philip Prindeville:
>> Stating facts here, not giving an opinion. Not sure what’s up for debate.
>>> 
>>> if it is empty it's <> aka Null-Sender and you really don't block that 
>>> because you violating RFC's, block sane autoreplies usng it to prevent 
>>> mail-loops and the subject indiactes one thing; you donät really understand 
>>> how email works
>> 
>> Rejecting messages based on their content PERIOD is violating the RFC’s.  
>> What’s your point?
> 
> do what you want - a empty envelope from is not a sign of spam
> 
> 


I never said it was.

What I said was that when it’s coming from a server that doesn’t except inbound 
messages (and hence can’t generate bounces) THEN it’s a sign of Spam.



Re: Test for empty EnvelopeFrom

2015-09-23 Thread Philip Prindeville

On Sep 22, 2015, at 12:58 PM, Reindl Harald <h.rei...@thelounge.net> wrote:

> 
> 
> Am 22.09.2015 um 19:43 schrieb Philip Prindeville:
>> I’m using SA with MdF on Linux (Fedora 22).
>> 
>> MdF generates the header “Return-Path: ” for me, so that should 
>> be available to me in the rules.
>> 
>> To test this, I wrote a couple of rules:
>> 
>> header __L_EMPTY_SENDER  EnvelopeFrom:addr !~ /./
>> header __L_MATCH_SENDER  EnvelopeFrom:addr =~ /.*/
> 
> sorry, but you need to understand what the envelope-from is
> 
> hint: it exists *always* because it's the "MAIL FROM" command long before 
> data and don't depend on headers, nice when your MTA adds a Envelope-Header, 
> normally it's he Return-Path and the sender *do not have* any business to 
> deal with that

No one said the sender had anything to do with that.  I was pointing out that 
SpamAssassin seems to extract the EnvelopeFrom either from the Return-Path: or 
a Envelope-Sender= or Envelope-From= line at the top of the message (in 
parse_received_line() in Mail::SpamAssassin:Message::Metadata::Received)… In 
this case, MdF is generating the first (if you don’t believe me, look in 
spam_assassin_mail() in /usr/bin/mimedefang.pl).

Stating facts here, not giving an opinion. Not sure what’s up for debate.


> 
> if it is empty it's <> aka Null-Sender and you really don't block that 
> because you violating RFC's, block sane autoreplies usng it to prevent 
> mail-loops and the subject indiactes one thing; you donät really understand 
> how email works

Rejecting messages based on their content PERIOD is violating the RFC’s.  
What’s your point?

If you think that no one is violating the RFC’s to spoof identities, obscure 
the origin of a message, etc. then you’re the one not understanding how email 
works in reality.


> 
> https://en.wikipedia.org/wiki/Bounce_message
> 


So what you’re saying is, since no one should block the null sender (i.e. the 
bounce message), it’s never going to be abused by anyone generating Spam…

For instance, a host which never ACCEPTS incoming messages and therefore would 
never send a BOUNCE back in response to a message it received would never 
GENERATE a Spam with a null sender…

Yet that’s exactly what’s happening to us.

We’re seeing messages like:

Return-Path: <>
Received: from APC01-PU1-obe.outbound.protection.outlook.com 
(mail-pu1apc01hn0234.outbound.protection.outlook.com [104.47.126.234])
by mail.redfish-solutions.com (8.14.9/8.14.9) with ESMTP id 
t8HHb88M003095
(version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=OK)
for ; Thu, 17 Sep 2015 11:37:14 -0600
Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=<>; 
Received: from [100.66.17.62] (116.202.32.68) by
 SIXPR01MB0400.apcprd01.prod.exchangelabs.com (10.160.240.141) with Microsoft
 SMTP Server (TLS) id 15.1.274.11; Thu, 17 Sep 2015 17:36:52 +
Content-Type: multipart/alternative; boundary="===0013032214=="
MIME-Version: 1.0
Subject: Your trust
…

It’s notable that *.outbound.protection.outlook.come will NEVER generate a 
bounce, and hence never generate a null-sender, since those are handled by the 
inbound servers.

But apparently I should accept these anyway, even though they represent an 
impossible scenario.

Okay, thanks for the great advice.

Next!

-Philip



Re: Test for empty EnvelopeFrom

2015-09-23 Thread Philip Prindeville

On Sep 23, 2015, at 6:35 AM, RW <rwmailli...@googlemail.com> wrote:

> On Tue, 22 Sep 2015 11:43:18 -0600
> Philip Prindeville wrote:
> 
>> Hi.
>> 
>> I?m using SA with MdF on Linux (Fedora 22).
>> 
>> MdF generates the header ?Return-Path: ? for me, so that
>> should be available to me in the rules.
>> 
>> To test this, I wrote a couple of rules:
>> 
>> header __L_EMPTY_SENDER  EnvelopeFrom:addr !~ /./
>> header __L_MATCH_SENDER  EnvelopeFrom:addr =~ /.*/
> 
> I think you're going to kick yourself.  ".*" ,means zero or more
> characters, so matches anything.
> 
> It looks like the superfluous "*" is the only thing wrong here.


No, I wanted to match the entirety of the Return-Path to see what it contained, 
and that SA was parsing it correctly… Using -D would allow me to see what the 
rule held as the match-string… except that wasn’t working due to bug 6360.  
Thanks Martin for fixing that so quickly!

Just wanted to make sure that it wasn’t something silly like including a 
leading space, for example.


> 
> 
>> What is a negative match, anyway?
> 
> AFAIK it just means that the rule matched without matching any actual
> text for the debug to display.
> 
> 
>> Am I seeing https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6360
>> in this case?
> 
> This looks to be a bug where "negative match" is also displayed if the
> matched text is "0", i.e. the number zero rather than a null string.

Yeah, exactly.  Or if you’re matching against the empty string…  I.e.:

EnvelopeFrom:addr =~ /^$/

-Philip




Test for empty EnvelopeFrom

2015-09-22 Thread Philip Prindeville
Hi.

I’m using SA with MdF on Linux (Fedora 22).

MdF generates the header “Return-Path: ” for me, so that should be 
available to me in the rules.

To test this, I wrote a couple of rules:

header __L_EMPTY_SENDER EnvelopeFrom:addr !~ /./
header __L_MATCH_SENDER EnvelopeFrom:addr =~ /.*/

I’ve also tried:

header __L_EMPTY_SENDER EnvelopeFrom:addr =~ /^$/

but in all cases, I get:

Sep 22 11:07:46.237 [26384] dbg: rules: ran header rule __L_EMPTY_SENDER 
==> got hit: "negative match"
Sep 22 11:07:46.237 [26384] dbg: rules: ran header rule __L_MATCH_SENDER 
==> got hit: "negative match”

which I don’t get, because other similar rules end up showing what $& (or 
${^MATCH}) would be (i.e. what matched).

What is a negative match, anyway?  Looking at the code, it seems that 
::hit_rule_plugin_code() only gets called when we matched anyway, so saying 
it’s a negative match is counter-intuitive.

Am I seeing https://bz.apache.org/SpamAssassin/show_bug.cgi?id=6360 in this 
case?  And comment 2 says:

  trunk (3.4.0):


  Bug 6360
  : "negative match" on a "0" string - not fixed,
appears cosmetic, just added a comment
  Sending lib/Mail/SpamAssassin/Plugin/Check.pm
  Committed 
  revision 1338300.

but the bug is marked “RESOLVED FIXED” so I’m confused.  Should it be “WONTFIX” 
instead?

Thanks,

-Philip



Re: Must-Have Plugins?

2015-06-23 Thread Philip Prindeville



On 06/19/2015 01:07 PM, Dianne Skoll wrote:

On Fri, 19 Jun 2015 12:51:28 -0600
Philip Prindeville philipp_s...@redfish-solutions.com wrote:

[stuff]


With this, we avoid ever accepting about 98% of the SPAM that we’d
otherwise receive.

Really?  98%?  I find that surprising.  We get quite a lot of spam
from gmail, hotmail, yahoo etc. that would pass all of your tests.

Regards,

Dianne.


I should have mentioned we also blacklist yahoo... and are thinking 
about blocking google, too.




Re: Must-Have Plugins?

2015-06-19 Thread Philip Prindeville

On Jun 19, 2015, at 2:35 PM, David Jones djo...@ena.com wrote:

 
 But I’m on a LOT of high volume mailing lists (like mozilla-general and 
 netdev) that get heavily spammed.
 
 Filtering mailing lists is a slightly different ballgame than filtering 
 regular email.  Some of the items listed above
 don't apply to or won't work with mailing lists (as Dianne Skoll mentioned) 
 since they are like proxies of the
 original sender's mail server.
 
 Dave

Sorry, I also meant that many of those mailing lists are harvested… so my 
address has been bought and sold many, many times.



Re: Must-Have Plugins?

2015-06-19 Thread Philip Prindeville



On 06/10/2015 04:34 AM, Amir Caspi wrote:

On Jun 10, 2015, at 12:32 AM, Matus UHLAR - fantomas uh...@fantomas.sk wrote:


FEATURE(`block_bad_helo')
define(`confALLOW_BOGUS_HELO', `False')

Argh, unfortunately, that feature is only on sendmail 8.14 and higher, which 
means RHEL/CentOS 6 or higher.  For those of us running RHEL/CentOS 5, that's 
only available via a custom install. =(

Does anyone know of a reputable RPM distro for sendmail 8.14+ for CentOS 5?  I 
can't find anything decent via Google, everything is for CentOS 6 or higher.

(My server setup requires RPMs, so I can't build from a source tarball. I could 
potentially use the source RPM from CentOS 6 to get a custom RPM for 5, 
although even that is problematic.)

Bleh.  I wish I could upgrade this server to a newer OS, but various 
circumstances prevent that right now.

--- Amir



Given how many vulnerabilities CentOS 5 has, why would you want to keep 
running that?




Re: Must-Have Plugins?

2015-06-19 Thread Philip Prindeville

On Jun 19, 2015, at 3:28 PM, David Jones djo...@ena.com wrote:

 From: Philip Prindeville philipp_s...@redfish-solutions.com
 Sent: Friday, June 19, 2015 3:53 PM
 To: David Jones
 Cc: users@spamassassin.apache.org
 Subject: Re: Must-Have Plugins?
 
 On Jun 19, 2015, at 2:35 PM, David Jones djo...@ena.com wrote:
 
 
 But I’m on a LOT of high volume mailing lists (like mozilla-general and 
 netdev) that get heavily spammed.
 
 Filtering mailing lists is a slightly different ballgame than filtering 
 regular email.  Some of the items listed above
 don't apply to or won't work with mailing lists (as Dianne Skoll mentioned) 
 since they are like proxies of the
 original sender's mail server.
 
 Dave
 
 Sorry, I also meant that many of those mailing lists are harvested… so my 
 address has been bought and sold many, many times.
 
 I see.  For email addresses that have gotten on those lists, what I have 
 found to be effective is to
 focus more on the reputation of the sending mail server.  Some mail servers 
 like mailchimp,
 sendgrid, constant contact, etc. will get these addresses but you can safely 
 unsubscribe from
 them and eventually get off of their lists over a few weeks.
 Most of the other mail servers can be blocked by the major RBLs and the other 
 techniques you
 mentioned in your original post.  There are safe ways to whitelist specific 
 sending IPs for domains
 where you don't have to put in a risky whitelist entry at the MTA level that 
 will open you up to
 spoofing problems.  This usually requires a little scripting to pull data 
 that has been vetted by the
 SA level then tie it back to the MTA level.
 The rest that gets to SA needs to be handled by properly trained Bayes, DBLs, 
 custom rules
 like KAM.cf, etc.


Well that was interesting.  How long at Hetzner been hosting one of the spam 
assassin.apache.org mirrors?

Because they have a very low reputation with us.

So your message, which includes the text “spamassassin.apache.org” got flagged 
and quarantined.

More than a little ironic…




.science the new leper of TLD's?

2015-06-19 Thread Philip Prindeville
No offense to lepers, but is .science to be avoided?  I’ve had email this week 
from about 17 different .science domain names, and 13 were blocked because of 
ZenBL and the rest turned out to be SPAM anyway.

I’m thinking that I should just refuse connections from any host whose rDNS is 
.science…

Has anyone had any POSITIVE experiences with .science domain names?



Re: Must-Have Plugins?

2015-06-19 Thread Philip Prindeville

On Jun 19, 2015, at 1:01 PM, David Jones djo...@ena.com wrote:

 From: Philip Prindeville philipp_s...@redfish-solutions.com
 
 On Jun 9, 2015, at 12:29 PM, John Hardin jhar...@impsec.org wrote:
 
 On Tue, 9 Jun 2015, David Jones wrote:
 
 Some of the best and easiest things you can enable to block spam are
 outside of SpamAssassin at your MTA (sendmail, postfix, etc.).
 
 - Enable greylisting.  This is just about the only way you can block
 zero-hour spam from compromised accounts that come from legit mail
 servers before they get listed in RBLs.
 
 Just bear in mind some commercial organizations may be very hostile to 
 anything that delays delivery of mail, regardless of how much it would 
 reduce spam.
 
 Two things that I have found very useful at the MTA level are:
 
 (1) Delay sending your SMTP banner a second or two and reject any sender 
 that starts sending information before that. This is a built-in option in 
 Sendmail, google greet_pause.
 
 (2) Check the HELO the other guy sends and reject if it's not a FQDN (i.e. 
 it's not got any periods at all). This probably shouldn't be done on mail 
 originating locally, but for mail coming in from the Internet the other MTA 
 should always be sending a FQDN in the HELO. A non-FQDN HELO is a pretty 
 good sign of a spambot sending from a compromised workstation or PC 
 directly to your MTA.
 
 I have some other MTA checks in place, but these two block the most.
 
 
 We use MimeDefang and it actually calls SpamAssassin.  But before we accept 
 connections (or email), we make a bunch of tests.
 
 We use Geo::IP to block a bunch of ISP’s and countries that are hostile in 
 filter_relay().
 
 These include ColoCrossing, Enzu, Datashack, Proxad, WholesaleInternet, etc.
 
 Then we apply more tests in filter_helo().  Note, these are only for 
 connections that arrive on port 25, not on port 587
 (which is for local clients to submit, and is authenticated with a 
 username/password pair).
 
 We accept connections from 127.0.0.1 with no further checks.
 
 We block anyone saying HELO \d+\.\d+\.\d+\.\d+ because it’s missing the 
 required brackets.  We block any bracketed dotted quads that have an 
 invalid quad (i.e. 300.300.300.300) or where the address they claim to be 
 from is different from the actual address we’re seeing (no one behind a 
 NATting firewall should be connecting to us unless they know their [static] 
 external address).
 
 We block anyone listed by ZenBL.
 
 We block (a) anyone claiming to be us, (b) anyone claiming to be “localhost” 
 or “localhost.localdomain”.
 
 We block a list of names that are always fraudulent, like “paypal.com” 
 (without any subdomains), or “smtp.communitel.net” which seems to get 
 spoofed a LOT, or “mail.com” or “mail.ru”.  We block hostnames that don’t 
 have a domain (no dots).
 
 Lastly, we TEMPFAIL hosts that don’t have valid rDNS mappings (including the 
 A and PTR records not agreeing).
 
 With this, we avoid ever accepting about 98% of the SPAM that we’d otherwise 
 receive.
 
 Good information.  Thanks for taking the time to document it for this list.
 How many mailboxes do you filter with this configuration?


Not that many.  About 20.

But I’m on a LOT of high volume mailing lists (like mozilla-general and netdev) 
that get heavily spammed.

-Philip


 
 -Philip



Re: Must-Have Plugins?

2015-06-19 Thread Philip Prindeville

On Jun 9, 2015, at 12:29 PM, John Hardin jhar...@impsec.org wrote:

 On Tue, 9 Jun 2015, David Jones wrote:
 
 Some of the best and easiest things you can enable to block spam are
 outside of SpamAssassin at your MTA (sendmail, postfix, etc.).
 
 - Enable greylisting.  This is just about the only way you can block
  zero-hour spam from compromised accounts that come from legit mail
  servers before they get listed in RBLs.
 
 Just bear in mind some commercial organizations may be very hostile to 
 anything that delays delivery of mail, regardless of how much it would reduce 
 spam.
 
 Two things that I have found very useful at the MTA level are:
 
 (1) Delay sending your SMTP banner a second or two and reject any sender that 
 starts sending information before that. This is a built-in option in 
 Sendmail, google greet_pause.
 
 (2) Check the HELO the other guy sends and reject if it's not a FQDN (i.e. 
 it's not got any periods at all). This probably shouldn't be done on mail 
 originating locally, but for mail coming in from the Internet the other MTA 
 should always be sending a FQDN in the HELO. A non-FQDN HELO is a pretty good 
 sign of a spambot sending from a compromised workstation or PC directly to 
 your MTA.
 
 I have some other MTA checks in place, but these two block the most.


We use MimeDefang and it actually calls SpamAssassin.  But before we accept 
connections (or email), we make a bunch of tests.

We use Geo::IP to block a bunch of ISP’s and countries that are hostile in 
filter_relay().

These include ColoCrossing, Enzu, Datashack, Proxad, WholesaleInternet, etc.

Then we apply more tests in filter_helo().  Note, these are only for 
connections that arrive on port 25, not on port 587 (which is for local clients 
to submit, and is authenticated with a username/password pair).

We accept connections from 127.0.0.1 with no further checks.

We block anyone saying HELO \d+\.\d+\.\d+\.\d+ because it’s missing the 
required brackets.  We block any bracketed dotted quads that have an invalid 
quad (i.e. 300.300.300.300) or where the address they claim to be from is 
different from the actual address we’re seeing (no one behind a NATting 
firewall should be connecting to us unless they know their [static] external 
address).

We block anyone listed by ZenBL.

We block (a) anyone claiming to be us, (b) anyone claiming to be “localhost” or 
“localhost.localdomain”.

We block a list of names that are always fraudulent, like “paypal.com” (without 
any subdomains), or “smtp.communitel.net” which seems to get spoofed a LOT, or 
“mail.com” or “mail.ru”.  We block hostnames that don’t have a domain (no dots).

Lastly, we TEMPFAIL hosts that don’t have valid rDNS mappings (including the A 
and PTR records not agreeing).

With this, we avoid ever accepting about 98% of the SPAM that we’d otherwise 
receive.

-Philip




Re: Can SpamAssasin convert UTF8 into ISO-8859-1?

2015-05-20 Thread Philip Prindeville

On Apr 15, 2015, at 7:07 PM, @lbutlr krem...@kreme.com wrote:

 On Apr 13, 2015, at 09:03, John Hardin jhar...@impsec.org wrote:
 The proper place for that sort of thing would be the tool that does final 
 delivery to the user's mailbox.
 
 There is no proper place for that.
 

No, it’s not.  But Mimedefang is.

-Philip



Testing SPF DKIM configurations

2015-05-20 Thread Philip Prindeville
Anyone know of a site that you can send an email to in order to test your SPF 
and/or DKIM configuration?

I’ve set it up but every once in a while I get back weird messages about being 
blocked from certain sites and I’m wondering if something is wrong at my end or 
are they just misconfigured at the receiving side…

Thanks,

-Philip



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: SOUGHT 2.0

2014-12-05 Thread Philip Prindeville

On Dec 4, 2014, at 2:41 PM, Axb axb.li...@gmail.com wrote:

 On 12/04/2014 10:30 PM, Bob Proulx wrote:
 Axb wrote:
 It's been more than a month since my first SOUGHT 2.0 msg.
 
 A few have shown interest but as there hasn't been the flood of enthusiasm
 and stuff getting done which I hoped for so I've dropped the idea of getting
 a public autogenerated rule set / sa-update channel going.
 
 Good poke!  And I will leave this to the list to show further
 enthusiasm for the project instead of simply promising time off list.
 
 Bob,
 
 It's not a poke - it's a fact.
 
 To be able to create usable rules, several times/day I need feeds to spit *at 
 least* +150k/day. As I don't have the data….

I’d offer, but besides aggressively filtering spam, we also block several 
countries using the GeoIP database on our firewall (so we never even see the 
SMTP connections), and block a couple hundred CIDR blocks and ISPs, so it 
wouldn’t be a representative of what you might receive potentially.






Re: Honeypot email addresses

2014-12-04 Thread Philip Prindeville


On 11/21/2014 09:49 AM, David F. Skoll wrote:

On Fri, 21 Nov 2014 08:43:22 -0800 (PST)
John Hardin jhar...@impsec.org wrote:


On a public mailng list isn't a great place to discuss such tactics...

I suspect spammers are dumb and will just vacuum up any address
they can find.  Also, the scammers who sell CDs with millions of
email addresses on them are unlikely to do anything but the most cursory
checking of the addresses.

Make a honeypot subdomain, put up any web content with
email addresses and I guarantee you'll start receiving email
on those addresses within a few days.

Regards,

David.


Having it appear in a resume on any of the job sites (dice, monster, 
ladders, etc) is a good way to get it harvested.


So is posting to mozilla-gene...@mozilla.org or net...@vger.kernel.org ...



Re: Honeypot email addresses

2014-12-04 Thread Philip Prindeville


On 12/04/2014 05:32 AM, Reindl Harald wrote:


Am 03.12.2014 um 23:56 schrieb Philip Prindeville:

On 11/21/2014 09:49 AM, David F. Skoll wrote:

On Fri, 21 Nov 2014 08:43:22 -0800 (PST)
John Hardin jhar...@impsec.org wrote:


On a public mailng list isn't a great place to discuss such tactics...

I suspect spammers are dumb and will just vacuum up any address
they can find.  Also, the scammers who sell CDs with millions of
email addresses on them are unlikely to do anything but the most 
cursory

checking of the addresses.

Make a honeypot subdomain, put up any web content with
email addresses and I guarantee you'll start receiving email
on those addresses within a few days.

Regards,

David.


Having it appear in a resume on any of the job sites (dice, monster,
ladders, etc) is a good way to get it harvested.

So is posting to mozilla-gene...@mozilla.org or 
net...@vger.kernel.org ...


and *that* is exactly hwat you should avoid: post a honeypot address 
somewhere actively - a honeypot address should never be submitted 
because you ask for troubles and false positives doing so




Not necessarily.  If I post to a list with this address, and wait 60 
days, I can assume that 99.999% of email that comes back after that date 
is not related to the original posting.


Further, after 15 days, anything which doesn't also copy the list is 
almost certainly Spam.


-Philip



Re: Honeypot email addresses

2014-12-04 Thread Philip Prindeville

On Dec 4, 2014, at 2:30 PM, Dave Pooser dave...@pooserville.com wrote:

 On 12/4/14, 3:10 PM, Philip Prindeville
 philipp_s...@redfish-solutions.com wrote:
 
 Not necessarily.  If I post to a list with this address, and wait 60
 days, I can assume that 99.999% of email that comes back after that date
 is not related to the original posting.
 
 Further, after 15 days, anything which doesn't also copy the list is
 almost certainly Spam.
 
 Eh, if I'm researching an odd bug, for instance, and find a 2-year old
 post from someone who had the same problem but don't see any resolution
 posted, I'll probably ping the OP offlist with a Hey, did you ever find
 the fix for that problem $FOO you were having? since I can't assume
 they're still subscribed to the list. I may or may not copy the list on
 that email, though I certainly will if I come up with an answer.
 
 obligatory http://xkcd.com/979/ reference here

I’ll have to remember to make the posting sufficiently uninteresting to be 
remembered for any significant length of time.  ;-)

-Philip




Re: Give a penalty to messages with non latin UTF-8 characters?

2014-10-20 Thread Philip Prindeville

On Oct 17, 2014, at 9:53 AM, Michael Opdenacker 
michael.opdenac...@free-electrons.com wrote:

 On 09/01/2014 01:39 AM, LuKreme wrote:
 On 31 Aug 2014, at 14:38 , Ian Zimmerman i...@buug.org wrote:
 
 Doesn't ok_languages and ok_locales do the job?  It does for me.
 Not with UTF-8 encoding, that setting only seems to apply to old-stye 
 character declarations.
 
 
 This was exactly my point. As long as characters are in utf-8,
 ok_locales doesn't trigger. And ok_languages needs a sufficient number
 of characters to trigger. A subject with only Chinese characters in
 UTF-8 isn't enough.
 
 Michael.

I explicitly add 10.0 to messages with charset of GB2312.  Unfortunately, a lot 
of Chinese engineers use clients that still use this charset as the default, 
and post to English language mailing lists.

There used to be a recommendation that MUA’s fit messages into the “smallest” 
encoding possible (smallest from the metric of how many characters it holds), 
i.e. USASCII, Latin1, UTF8.  Period.

Thus Chinese posters of legitimate messages to English language mailing lists 
would use USASCII or Latin1 (if replying to someone named André).

I don’t understand why Apple’s Mail.app, for instance, defaults to Win-1252 
here in the US. That’s braindead.

Apple won’t bundle Flash with MacOS because it’s not an Open Standard, but 
they’ll embrace a vendor-specific character code when a superior Open Standard 
encoding exists.  Go figure.

-Philip



.link TLD spammer haven?

2014-10-13 Thread Philip Prindeville
Every connection I’ve gotten from a hostname resolving to *.link or saying helo 
*.link has been spam (I block the connections with MIMEDefang).

Has anyone actually seen a legitimate email from a host in the .link TLD?

I’ve seen (last week alone):

bgo.blc-onlineconsumer140.link
ratio.allgiftcardsonlinefriendly.link
ratio.autodealersstarted.link
ratio.forincreasemensvitality.link
ratio.growingmedicareprovider.link
ratio.instantarrestrecordseducate.link
ratio.invitegiftcards.link
ratio.largelycarsavings.link
ratio.medicaresourceshigh.link
ratio.publicarrestrecordsdaily.link
ratio.readybloodpressuremonitor.link
tcvd.internetspecial-week242.link
tcvd.internetspecial-week243.link
tcvd.internetspecial-week244.link
tcvd.internetspecial-week246.link
tcvd.internetspecial-week248.link
tcvd.internetspecial-week250.link
tcvd.internetspecial-week251.link
tcvd.internetspecial-week252.link
tcvd.internetspecial-week254.link
vnds.ds-customerreviews101.link
vnds.ds-customerreviews102.link
vps.35dscustomersreviews.link
vps.38-vpscresthosting.link
vps.39-vpscresthosting.link
vps.40-vpscresthosting.link
vps.42-vpscresthosting.link
vps.45dscustomersreviews.link
vps.45-vpscresthosting.link
vps.46dscustomersreviews.link
vtds.customeronlinerev212.link
vtds.customeronlinerev214.link
vtds.customeronlinerev216.link
vtds.customeronlinerev219.link
vtds.customeronlinerev221.link


Is it worth having that triggers on the relay’s hostname being *.link?

Also, I noticed that every message we saw was missing a Received: header…

-Philip



Re: Googlasi, blacklotus, etc.

2014-10-02 Thread Philip Prindeville
BTW, I finally picked up the phone and spoke to support at Blacklotus (the ARIN 
PoC for abuse there gives bogus info) and discussed this with them.

They refused to believe that a site offering:

* weight loss meds
* miracle cures for diabetes
* tax-deductible window upgrades
* Victoria’s Secret gift cards
* Costco git cards
* discounted Ford vehicles
* discounted GM vehicles
* background checks
* free credit checks
* testosterone supplements

etc. might actually be a phishing website.  How many legitimate businesses can 
you think of that do all of these things together?

Pinheads.



  1   2   3   4   >