Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-12-26 Thread Philip Prindeville



> On Nov 16, 2021, at 8:03 PM, Henrik K  wrote:
> 
> On Tue, Nov 16, 2021 at 01:08:16PM -0700, Philip Prindeville wrote:
>> 
>> Or http.sh points to an NS that's offline...
> 
> Your resolver shoukd time out _way_ sooner than some minutes.
> 
>> Can the async lookup be back-ported?
> 
> No, and there will be no new 3.4 releases.
> 


Yeah, I still need to figure that out...

When I run "dig -t any http.sh" it times out after a few seconds.  But 
SpamAssassin is doing something very different.  Not sure why.

In any case, the workaround seems to be:

uri_block_exclude __L_BLOCK_ISP ... http.sh shlom.in


Where not resolving these last two domains makes the timeouts go away.  Note 
that the pathology is the same in both cases:

philipp@macbook3 ~ % dig @8.8.8.8 -tns shlom.in.

; <<>> DiG 9.10.6 <<>> @8.8.8.8 -tns shlom.in.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38665
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;shlom.in.  IN  NS

;; ANSWER SECTION:
shlom.in.   300 IN  NS  ns1gmz.name.com.
shlom.in.   300 IN  NS  ns2jrt.name.com.
shlom.in.   300 IN  NS  ns3qtx.name.com.
shlom.in.   300 IN  NS  ns4blx.name.com.

;; Query time: 84 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sun Dec 26 15:25:44 MST 2021
;; MSG SIZE  rcvd: 129

philipp@macbook3 ~ % 
philipp@macbook3 ~ % dig @8.8.8.8 -tns http.sh.

; <<>> DiG 9.10.6 <<>> @8.8.8.8 -tns http.sh.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10013
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;http.sh.   IN  NS

;; ANSWER SECTION:
http.sh.60  IN  CNAME   park.io.
park.io.14797   IN  NS  ns-1348.awsdns-40.org.
park.io.14797   IN  NS  ns-1624.awsdns-11.co.uk.
park.io.14797   IN  NS  ns-441.awsdns-55.com.
park.io.14797   IN  NS  ns-672.awsdns-20.net.

;; Query time: 245 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sun Dec 26 15:25:03 MST 2021
;; MSG SIZE  rcvd: 197

philipp@macbook3 ~ % 


Seems a little broken that the NS records aren't accompanied by 'A' glue 
records, but that's not catastrophic... normally a 2nd query would be done.

Should the resolver code in SpamAssassin be more robust when it comes to such 
failures?


philipp@macbook3 ~ % dig -ta ns-1348.awsdns-40.org.

; <<>> DiG 9.10.6 <<>> -ta ns-1348.awsdns-40.org.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37011
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;ns-1348.awsdns-40.org. IN  A

;; ANSWER SECTION:
ns-1348.awsdns-40.org.  78740   IN  A   205.251.197.68

;; Query time: 51 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Sun Dec 26 15:27:16 MST 2021
;; MSG SIZE  rcvd: 66

philipp@macbook3 ~ % dig @205.251.197.68 -ta http.sh

; <<>> DiG 9.10.6 <<>> @205.251.197.68 -ta http.sh
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 28411
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;http.sh.   IN  A

;; Query time: 67 msec
;; SERVER: 205.251.197.68#53(205.251.197.68)
;; WHEN: Sun Dec 26 15:27:32 MST 2021
;; MSG SIZE  rcvd: 25

philipp@macbook3 ~ % 


I'm not exactly sure what's falling down or why.

Is there anyone with more BIND-fu than me that's willing to venture a guess?

-Philip



Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-16 Thread Henrik K
On Tue, Nov 16, 2021 at 01:08:16PM -0700, Philip Prindeville wrote:
> 
> Or http.sh points to an NS that's offline...

Your resolver shoukd time out _way_ sooner than some minutes.

> Can the async lookup be back-ported?

No, and there will be no new 3.4 releases.



Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-16 Thread Philip Prindeville



> On Nov 16, 2021, at 3:30 AM, Martin Gregorie  wrote:
> 
> On Mon, 2021-11-15 at 17:12 -0700, Philip Prindeville wrote:
>> 
>> 
>>> On Nov 15, 2021, at 5:06 PM, Greg Troxel  wrote:
>>> 
>>> 
>>> Philip Prindeville  writes:
>>> 
 Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
 
 Why can't I even find the rule?
>>> 
> try "locate txrep"
> 
> On my Fedora system 'locate' says that TxRep is a plugin, enabled by
> installing:  /usr/share/spamassassin/60_txrep.cf
> 
> and, using "locate" again, that the plugin's manpage is  
> /usr/share/man/man3/Mail::SpamAssassin::Plugin::TxRep.3pm.gz
> 
> So, "man 3 Mail::SpamAssassin::Plugin::TxRep" describes the TxRep plugin
> and 'locate' says it is installed as
> /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm
> 
> Of course, other Linux distros may put it somewhere else, so use
> 'locate' and, if it doesn't find 'txrep', run 'sudo updatedb' and try
> again. 
> 
> Not trying to teach you to suck eggs, but, incredible as it may sound,
> there are still some people who don't know about the 'locate' command.
> 
> Martin
> 



I know how to find the code modules.  I wasn't sure where the database was 
kept.  And I invoke SA from within Mimedefang, not via procmail on individual 
delivery, so only global rules get run.  Nothing is per-user.

-Philip




Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-16 Thread Philip Prindeville



> On Nov 15, 2021, at 11:12 PM, Henrik K  wrote:
> 
> On Mon, Nov 15, 2021 at 04:25:55PM -0700, Philip Prindeville wrote:
>> 
>> 
>>> On Nov 12, 2021, at 10:35 PM, Henrik K  wrote:
>>> 
>>> On Fri, Nov 12, 2021 at 07:49:00PM -0800, John Hardin wrote:
 
 What would be helpful here would be logging of when a rule *starts*
 evaluation. Normally that would be painful, but for tracking a runaway it
 would be useful. Perhaps I can code up something to capture that and log it
 on a timeout...
>>> 
>>> It already exists
>>> 
>>> spamassassin -D all,rules-all < msg
>>> 
>> 
>> 
>> Ah, that was useful.
>> 
>> Seeing:
>> 
>> ...
>> Nov 15 16:09:40.479 [54834] dbg: check: uri_block_cidr list 
>> 54.69.70.160-54.69.70.160 65.181.64.0-65.181.127.255 97.74.42.81-97.74.42.81 
>> 98.124.199.1-98.124.199.1 167.114.67.88-167.114.67.88 
>> 173.45.101.58-173.45.101.58 185.53.128.0-185.53.131.255 
>> 192.31.186.4-192.31.186.4 192.99.0.0-192.99.255.255 
>> 194.213.125.57-194.213.125.57
>> Nov 15 16:11:50.610 [54834] dbg: check: uri_local_bl http.sh addrs 
>> ...
>> Nov 15 16:16:00.876 [54834] dbg: async: timing: 5.193 . 
>> dns:A:100.136.134.134.psbl.surriel.com
>> Nov 15 16:16:00.876 [54834] dbg: async: timing: 5.199 . 
>> dns:A:100.136.134.134.dnsbl.sorbs.net
>> Nov 15 16:16:00.876 [54834] dbg: async: timing: 385.726 X A:http.sh
>> Nov 15 16:16:00.876 [54834] dbg: async: timing: 385.726 X NS:http.sh
>> ...
>> 
>> 
>> Why would resolving http.sh take this long?  And can we bring down the 
>> timeout?
>> 
>> Hard to imagine DNS requests taking more than a couple of seconds.
>> 
>> -Philip
> 
> SA 3.4 has old version of URILocalBL without async lookups, probably that's
> why.  Not sure why it would take minutes though, unless your system DNS is
> configured strangely and/or is not caching..
> 



Or http.sh points to an NS that's offline...

Can the async lookup be back-ported?




Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-16 Thread Philip Prindeville
Replies... some duplication of conversation on "mimedefang".



> On Nov 15, 2021, at 10:34 PM, Bill Cole 
>  wrote:
> 
> On 2021-11-15 at 18:08:20 UTC-0500 (Mon, 15 Nov 2021 16:08:20 -0700)
> Philip Prindeville 
> is rumored to have said:
> 
>>> On Nov 12, 2021, at 8:49 PM, John Hardin  wrote:
>>> 
>>> On Fri, 12 Nov 2021, Philip Prindeville wrote:
>>> 
 I got the message, saved it to a flat file, and ran "spamassassin -t -D 
 rules < netdev.eml" and saw:
 
 ...
 Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule 
 __ANY_TEXT_ATTACH_DOC ==> got hit (1)
 ...
 Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH 
 ==> got hit (1)
 Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in 
 Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1, skipping 
 further tests
 ...
 
 Am I correct that __ANY_TEXT_ATTACH alone took 4:30s?
>>> 
>>> "ran ... got hit" is past tense. And it needs to complete the rule to know 
>>> whether it got a hit.
>>> 
>>> 11:45:38.048 -> 11:45:38.063 = less than 20 msec.
>>> 
>>> The next rule, whatever that was, is the one that timed out after 4m20s.
>> 
>> 
>> Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
>> 
>> Why can't I even find the rule?
> 
> That "rule" is actually a subroutine that is assembled and named at runtime 
> in M:SA:Check from set 1 (probably the only set) of the "body eval" (type 11) 
> rules running at priority 0.


Yeah, if we can get the source file and line # that's almost as good as a 
function name (since there isn't one, if it's an anonymous sub block).


> 
> Which suggests a tough bit of troubleshooting.
> 
 Could there be rules that *aren't* matching but are taking a while?
>>> 
>>> It's timing out on a rule that's running away. The timeout triggers before 
>>> "hit/no hit" is known.
>>> 
>>> What would be helpful here would be logging of when a rule *starts* 
>>> evaluation. Normally that would be painful, but for tracking a runaway it 
>>> would be useful. Perhaps I can code up something to capture that and log it 
>>> on a timeout...
>> 
>> 
>> Whenever a rule gets started, you could save the name and start time, and 
>> then burp that during timeout handling, right?
> 
> I like that idea. I have no idea how feasible it is.


Me neither.  I use Perl less and less everyday, and my fu is fading almost as 
fast as new features are coming in.

-Philip



Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-16 Thread John Hardin

On Mon, 15 Nov 2021, Philip Prindeville wrote:





On Nov 12, 2021, at 8:49 PM, John Hardin  wrote:

On Fri, 12 Nov 2021, Philip Prindeville wrote:


I got the message, saved it to a flat file, and ran "spamassassin -t -D rules < 
netdev.eml" and saw:

...
Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH_DOC 
==> got hit (1)
...
Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH ==> 
got hit (1)
Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in 
Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1, skipping 
further tests
...

Am I correct that __ANY_TEXT_ATTACH alone took 4:30s?


"ran ... got hit" is past tense. And it needs to complete the rule to know 
whether it got a hit.

11:45:38.048 -> 11:45:38.063 = less than 20 msec.

The next rule, whatever that was, is the one that timed out after 4m20s.



Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.

Why can't I even find the rule?


Run it with "-D rules,rules-all" and it should list each rule as it 
starts executing.








Could there be rules that *aren't* matching but are taking a while?


It's timing out on a rule that's running away. The timeout triggers before "hit/no 
hit" is known.

What would be helpful here would be logging of when a rule *starts* evaluation. 
Normally that would be painful, but for tracking a runaway it would be useful. 
Perhaps I can code up something to capture that and log it on a timeout...



Whenever a rule gets started, you could save the name and start time, and then 
burp that during timeout handling, right?


The rule name at least.


If you want to send me that message zipped up I can try it here with those 
changes and see if it's a base rule running away.


Sent out-of-band.

Doh.  Forgot to zip it.


I'll be happy to take a look, but running with rules-all at your end would 
be faster... I forgot I'd already added that. But I will still take a look 
at capturing the rule name for the timeout message.



--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Never forget, even for an instant, that the one and only reason
  anyone has for taking your gun away is to make you weaker than
  he is, so he can do something to you that you wouldn’t let him
  do if you were equipped to prevent it. This goes for burglars,
  muggers, and rapists, and even more so for policemen,
  bureaucrats, and politicians. -- Alexander Pope
---
 535 days since the first private commercial manned orbital mission (SpaceX)

Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-16 Thread Dave Wreski




For that matter how many know about 'apropos'? And, even if they do,
they may not discover 'locate' because 'apropos search' doesn't find
either 'updatedb' or 'locate'. You have to enter 'apropos find' to
discover that 'locate' exists, and even then you could get side tracked
into trying to use the much more complex 'find' utility.


Or the old-school rpm:

$ rpm -ql spamassassin|grep TxRep
/usr/share/man/man3/Mail::SpamAssassin::Plugin::TxRep.3pm.gz
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm

Dave



Martin



Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-16 Thread Martin Gregorie
On Tue, 2021-11-16 at 08:33 -0500, Bill Cole wrote:
> 
> Worth noting: locate & updatedb aren't always installed.
> 
Fair comment: they're a standard part of Fedora. IIRC they are also part
of the RaspberryPi OS distro, so are likely to be included in Debian and
most of its clones.

But: how many "intro to Linux" books mention 'updatedb' or 'locate'?
They're not mentioned in 'UNIX in a Nutshell' (so maybe not in 'Linux in
a Nutshell' either) or in the 'RaspberryPi User Guide', though they are
in Michael Kopfler's 1999 Linux sysadmin book.  

You don't see them mentioned much in newsgroups either, so I'm left
wondering how many Linux users who either taught themselves to use Linux
or picked it up at work from a colleague would ever know about 'locate'
and 'updatedb'?  

For that matter how many know about 'apropos'? And, even if they do,
they may not discover 'locate' because 'apropos search' doesn't find
either 'updatedb' or 'locate'. You have to enter 'apropos find' to
discover that 'locate' exists, and even then you could get side tracked
into trying to use the much more complex 'find' utility.  

Martin




Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-16 Thread Henrik K
On Tue, Nov 16, 2021 at 08:33:56AM -0500, Bill Cole wrote:
> On 2021-11-16 at 05:30:25 UTC-0500 (Tue, 16 Nov 2021 10:30:25 +)
> Martin Gregorie 
> is rumored to have said:
> 
> > Of course, other Linux distros may put it somewhere else, so use
> > 'locate' and, if it doesn't find 'txrep', run 'sudo updatedb' and try
> > again.
> >
> > Not trying to teach you to suck eggs, but, incredible as it may sound,
> > there are still some people who don't know about the 'locate' command.
> 
> Worth noting: locate & updatedb aren't always installed.

Offtopic, but there's even very little reason to install it, it just eats
resources, not to mention one more point of vulnerability.  If you need to
search something, "find /" takes some seconds on any remotely modern system
and always returns fresh results.



Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-16 Thread Bill Cole
On 2021-11-16 at 05:30:25 UTC-0500 (Tue, 16 Nov 2021 10:30:25 +)
Martin Gregorie 
is rumored to have said:

> Of course, other Linux distros may put it somewhere else, so use
> 'locate' and, if it doesn't find 'txrep', run 'sudo updatedb' and try
> again.
>
> Not trying to teach you to suck eggs, but, incredible as it may sound,
> there are still some people who don't know about the 'locate' command.

Worth noting: locate & updatedb aren't always installed.


-- 
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-16 Thread Martin Gregorie
On Mon, 2021-11-15 at 17:12 -0700, Philip Prindeville wrote:
> 
> 
> > On Nov 15, 2021, at 5:06 PM, Greg Troxel  wrote:
> > 
> > 
> > Philip Prindeville  writes:
> > 
> > > Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
> > > 
> > > Why can't I even find the rule?
> > 
try "locate txrep"

On my Fedora system 'locate' says that TxRep is a plugin, enabled by
installing:  /usr/share/spamassassin/60_txrep.cf

and, using "locate" again, that the plugin's manpage is  
/usr/share/man/man3/Mail::SpamAssassin::Plugin::TxRep.3pm.gz

So, "man 3 Mail::SpamAssassin::Plugin::TxRep" describes the TxRep plugin
and 'locate' says it is installed as
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm

Of course, other Linux distros may put it somewhere else, so use
'locate' and, if it doesn't find 'txrep', run 'sudo updatedb' and try
again. 

Not trying to teach you to suck eggs, but, incredible as it may sound,
there are still some people who don't know about the 'locate' command.
 
Martin

> 




Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-15 Thread Henrik K
On Mon, Nov 15, 2021 at 04:25:55PM -0700, Philip Prindeville wrote:
> 
> 
> > On Nov 12, 2021, at 10:35 PM, Henrik K  wrote:
> > 
> > On Fri, Nov 12, 2021 at 07:49:00PM -0800, John Hardin wrote:
> >> 
> >> What would be helpful here would be logging of when a rule *starts*
> >> evaluation. Normally that would be painful, but for tracking a runaway it
> >> would be useful. Perhaps I can code up something to capture that and log it
> >> on a timeout...
> > 
> > It already exists
> > 
> > spamassassin -D all,rules-all < msg
> > 
> 
> 
> Ah, that was useful.
> 
> Seeing:
> 
> ...
> Nov 15 16:09:40.479 [54834] dbg: check: uri_block_cidr list 
> 54.69.70.160-54.69.70.160 65.181.64.0-65.181.127.255 97.74.42.81-97.74.42.81 
> 98.124.199.1-98.124.199.1 167.114.67.88-167.114.67.88 
> 173.45.101.58-173.45.101.58 185.53.128.0-185.53.131.255 
> 192.31.186.4-192.31.186.4 192.99.0.0-192.99.255.255 
> 194.213.125.57-194.213.125.57
> Nov 15 16:11:50.610 [54834] dbg: check: uri_local_bl http.sh addrs 
> ...
> Nov 15 16:16:00.876 [54834] dbg: async: timing: 5.193 . 
> dns:A:100.136.134.134.psbl.surriel.com
> Nov 15 16:16:00.876 [54834] dbg: async: timing: 5.199 . 
> dns:A:100.136.134.134.dnsbl.sorbs.net
> Nov 15 16:16:00.876 [54834] dbg: async: timing: 385.726 X A:http.sh
> Nov 15 16:16:00.876 [54834] dbg: async: timing: 385.726 X NS:http.sh
> ...
> 
> 
> Why would resolving http.sh take this long?  And can we bring down the 
> timeout?
> 
> Hard to imagine DNS requests taking more than a couple of seconds.
> 
> -Philip

SA 3.4 has old version of URILocalBL without async lookups, probably that's
why.  Not sure why it would take minutes though, unless your system DNS is
configured strangely and/or is not caching..



Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-15 Thread Bill Cole

On 2021-11-15 at 18:08:20 UTC-0500 (Mon, 15 Nov 2021 16:08:20 -0700)
Philip Prindeville 
is rumored to have said:


On Nov 12, 2021, at 8:49 PM, John Hardin  wrote:

On Fri, 12 Nov 2021, Philip Prindeville wrote:

I got the message, saved it to a flat file, and ran "spamassassin -t 
-D rules < netdev.eml" and saw:


...
Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule 
__ANY_TEXT_ATTACH_DOC ==> got hit (1)

...
Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule 
__ANY_TEXT_ATTACH ==> got hit (1)
Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in 
Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1, 
skipping further tests

...

Am I correct that __ANY_TEXT_ATTACH alone took 4:30s?


"ran ... got hit" is past tense. And it needs to complete the rule to 
know whether it got a hit.


11:45:38.048 -> 11:45:38.063 = less than 20 msec.

The next rule, whatever that was, is the one that timed out after 
4m20s.



Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.

Why can't I even find the rule?


That "rule" is actually a subroutine that is assembled and named at 
runtime in M:SA:Check from set 1 (probably the only set) of the "body 
eval" (type 11) rules running at priority 0.


Which suggests a tough bit of troubleshooting.


Could there be rules that *aren't* matching but are taking a while?


It's timing out on a rule that's running away. The timeout triggers 
before "hit/no hit" is known.


What would be helpful here would be logging of when a rule *starts* 
evaluation. Normally that would be painful, but for tracking a 
runaway it would be useful. Perhaps I can code up something to 
capture that and log it on a timeout...



Whenever a rule gets started, you could save the name and start time, 
and then burp that during timeout handling, right?


I like that idea. I have no idea how feasible it is.




--
Bill Cole
b...@scconsult.com or billc...@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire


Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-15 Thread Greg Troxel

Philip Prindeville  writes:

>> That looks very familiar.  I was having timeouts, and saw that in the
>> logs, on certain messages.  I ended up nuking and rebuilding my TXREP
>> database and then things were  ok.
>> 
>> That doesn't explain why we can't find the rule, which is a good
>> question.
>
> Where is the TXREP database?

For me, it's in:

  $HOME/.spamassassin/tx-reputation

> Also, is it possible that the name is generated through some sort of
> mangling, the way that C function names can be generated from macro
> expansions, etc?

Almost anything is possible...


signature.asc
Description: PGP signature


Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-15 Thread Philip Prindeville



> On Nov 15, 2021, at 5:06 PM, Greg Troxel  wrote:
> 
> 
> Philip Prindeville  writes:
> 
>> Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
>> 
>> Why can't I even find the rule?
> 
> That looks very familiar.  I was having timeouts, and saw that in the
> logs, on certain messages.  I ended up nuking and rebuilding my TXREP
> database and then things were  ok.
> 
> That doesn't explain why we can't find the rule, which is a good
> question.
> 


Where is the TXREP database?

Also, is it possible that the name is generated through some sort of mangling, 
the way that C function names can be generated from macro expansions, etc?




Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-15 Thread Greg Troxel

Philip Prindeville  writes:

> Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.
>
> Why can't I even find the rule?

That looks very familiar.  I was having timeouts, and saw that in the
logs, on certain messages.  I ended up nuking and rebuilding my TXREP
database and then things were  ok.

That doesn't explain why we can't find the rule, which is a good
question.



signature.asc
Description: PGP signature


Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-15 Thread Philip Prindeville



> On Nov 12, 2021, at 10:35 PM, Henrik K  wrote:
> 
> On Fri, Nov 12, 2021 at 07:49:00PM -0800, John Hardin wrote:
>> 
>> What would be helpful here would be logging of when a rule *starts*
>> evaluation. Normally that would be painful, but for tracking a runaway it
>> would be useful. Perhaps I can code up something to capture that and log it
>> on a timeout...
> 
> It already exists
> 
> spamassassin -D all,rules-all < msg
> 


Ah, that was useful.

Seeing:

...
Nov 15 16:09:40.479 [54834] dbg: check: uri_block_cidr list 
54.69.70.160-54.69.70.160 65.181.64.0-65.181.127.255 97.74.42.81-97.74.42.81 
98.124.199.1-98.124.199.1 167.114.67.88-167.114.67.88 
173.45.101.58-173.45.101.58 185.53.128.0-185.53.131.255 
192.31.186.4-192.31.186.4 192.99.0.0-192.99.255.255 
194.213.125.57-194.213.125.57
Nov 15 16:11:50.610 [54834] dbg: check: uri_local_bl http.sh addrs 
...
Nov 15 16:16:00.876 [54834] dbg: async: timing: 5.193 . 
dns:A:100.136.134.134.psbl.surriel.com
Nov 15 16:16:00.876 [54834] dbg: async: timing: 5.199 . 
dns:A:100.136.134.134.dnsbl.sorbs.net
Nov 15 16:16:00.876 [54834] dbg: async: timing: 385.726 X A:http.sh
Nov 15 16:16:00.876 [54834] dbg: async: timing: 385.726 X NS:http.sh
...


Why would resolving http.sh take this long?  And can we bring down the timeout?

Hard to imagine DNS requests taking more than a couple of seconds.

-Philip



Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-15 Thread Philip Prindeville



> On Nov 12, 2021, at 8:49 PM, John Hardin  wrote:
> 
> On Fri, 12 Nov 2021, Philip Prindeville wrote:
> 
>> I got the message, saved it to a flat file, and ran "spamassassin -t -D 
>> rules < netdev.eml" and saw:
>> 
>> ...
>> Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH_DOC 
>> ==> got hit (1)
>> ...
>> Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH 
>> ==> got hit (1)
>> Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in 
>> Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1, skipping 
>> further tests
>> ...
>> 
>> Am I correct that __ANY_TEXT_ATTACH alone took 4:30s?
> 
> "ran ... got hit" is past tense. And it needs to complete the rule to know 
> whether it got a hit.
> 
> 11:45:38.048 -> 11:45:38.063 = less than 20 msec.
> 
> The next rule, whatever that was, is the one that timed out after 4m20s.


Ah, the rule _eval_tests_type11_pri0_set1() took 4:20.

Why can't I even find the rule?


> 
>> Could there be rules that *aren't* matching but are taking a while?
> 
> It's timing out on a rule that's running away. The timeout triggers before 
> "hit/no hit" is known.
> 
> What would be helpful here would be logging of when a rule *starts* 
> evaluation. Normally that would be painful, but for tracking a runaway it 
> would be useful. Perhaps I can code up something to capture that and log it 
> on a timeout...


Whenever a rule gets started, you could save the name and start time, and then 
burp that during timeout handling, right?


> 
> If you want to send me that message zipped up I can try it here with those 
> changes and see if it's a base rule running away.
> 


Sent out-of-band.

Doh.  Forgot to zip it.




Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-13 Thread John Hardin

On Sat, 13 Nov 2021, Loren Wilton wrote:

What would be helpful here would be logging of when a rule *starts* 
evaluation. Normally that would be painful, but for tracking a runaway it 
would be useful. Perhaps I can code up something to capture that and log it 
on a timeout...


Actually what sounds like it would be useful would be knowing the name of the 
rule that timed out. I'm presuming when the timeout occurs that there is 
still some indication of the current rule being processed so it can be 
killed. I'd think that should be enough to backtrack to the rule name. A 
modification to the timeout message could display the name of the rule and 
even how long it took to that point.


That's what I was thinking when I said "capture and log".

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Running away is the coward's way out of a war;
  appeasement is the coward's way into a war.   -- Thorax
---
 532 days since the first private commercial manned orbital mission (SpaceX)


Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-13 Thread John Hardin

On Sat, 13 Nov 2021, Henrik K wrote:


On Fri, Nov 12, 2021 at 07:49:00PM -0800, John Hardin wrote:


What would be helpful here would be logging of when a rule *starts*
evaluation. Normally that would be painful, but for tracking a runaway it
would be useful. Perhaps I can code up something to capture that and log it
on a timeout...


It already exists

spamassassin -D all,rules-all < msg


Ugh, yeah, I remember doing that now.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Running away is the coward's way out of a war;
  appeasement is the coward's way into a war.   -- Thorax
---
 532 days since the first private commercial manned orbital mission (SpaceX)


Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-13 Thread Loren Wilton
What would be helpful here would be logging of when a rule *starts* 
evaluation. Normally that would be painful, but for tracking a runaway it 
would be useful. Perhaps I can code up something to capture that and log 
it on a timeout...


Actually what sounds like it would be useful would be knowing the name of 
the rule that timed out. I'm presuming when the timeout occurs that there is 
still some indication of the current rule being processed so it can be 
killed. I'd think that should be enough to backtrack to the rule name. A 
modification to the timeout message could display the name of the rule and 
even how long it took to that point.


I guess there might be multiple rules running when the timeout occurs and 
not know which one really timed out, but that would still be a small number 
of rule names.


   Loren



Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-12 Thread Henrik K
On Fri, Nov 12, 2021 at 07:49:00PM -0800, John Hardin wrote:
> 
> What would be helpful here would be logging of when a rule *starts*
> evaluation. Normally that would be painful, but for tracking a runaway it
> would be useful. Perhaps I can code up something to capture that and log it
> on a timeout...

It already exists

spamassassin -D all,rules-all < msg



Re: Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-12 Thread John Hardin

On Fri, 12 Nov 2021, Philip Prindeville wrote:


I got the message, saved it to a flat file, and ran "spamassassin -t -D rules < 
netdev.eml" and saw:

...
Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH_DOC 
==> got hit (1)
...
Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH ==> 
got hit (1)
Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in 
Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1, skipping 
further tests
...

Am I correct that __ANY_TEXT_ATTACH alone took 4:30s?


"ran ... got hit" is past tense. And it needs to complete the rule to know 
whether it got a hit.


11:45:38.048 -> 11:45:38.063 = less than 20 msec.

The next rule, whatever that was, is the one that timed out after 4m20s.


Could there be rules that *aren't* matching but are taking a while?


It's timing out on a rule that's running away. The timeout triggers before 
"hit/no hit" is known.


What would be helpful here would be logging of when a rule *starts* 
evaluation. Normally that would be painful, but for tracking a runaway it 
would be useful. Perhaps I can code up something to capture that and log 
it on a timeout...


If you want to send me that message zipped up I can try it here with those 
changes and see if it's a base rule running away.




--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The most glaring example of the cognitive dissonance on the left
  is the concept that human beings are inherently good, yet at the
  same time cannot be trusted with any kind of weapon, unless the
  magic fairy dust of government authority gets sprinkled upon them.
   -- Moshe Ben-David
---
 531 days since the first private commercial manned orbital mission (SpaceX)


Seeing "check: exceeded time limit in ..." and need to resolve it

2021-11-12 Thread Philip Prindeville
Hi,

I got an email from net...@vger.kernel.org that was a lengthy (422K) regression 
test report from a patch someone had submitted.

I got the message, saved it to a flat file, and ran "spamassassin -t -D rules < 
netdev.eml" and saw:

...
Nov 12 11:45:38.048 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH_DOC 
==> got hit (1)
...
Nov 12 11:45:38.063 [36367] dbg: rules: ran eval rule __ANY_TEXT_ATTACH ==> 
got hit (1)
Nov 12 11:49:58.565 [36367] info: check: exceeded time limit in 
Mail::SpamAssassin::Plugin::Check::_eval_tests_type11_pri0_set1, skipping 
further tests
...

Am I correct that __ANY_TEXT_ATTACH alone took 4:30s? Looking at the rule, I 
don't understand why it's taking so long...  unless that's not the smoking gun. 
 Could there be rules that *aren't* matching but are taking a while?

72_active.cf:  mimeheader  __ANY_TEXT_ATTACH Content-Type =~ /text\/\w+/i

And how do I dig into why I'm getting that last message?

I can't even find type11_pri0_set1 as a string in 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/

Also, why are there multiple runs of:

Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.368 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.369 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.369 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.369 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"
Nov 12 15:05:37.369 [38290] dbg: rules: ran body rule __LOWER_E ==> got 
hit: "e"


Should this be capped to a maximum number of matches the way __HIGHBITS is?

And I'm not sure I want messages that haven't been fully scanned being 
delivered.  Should I crank TIME_LIMIT_EXCEEDED to 20.0?

Thanks,

-Philip