[squid-users] Squid "suspending ICAP service for too many failures"

2021-01-27 Thread Andrea Venturoli

Hello.

On a box I manage, Squids occasionally stops for a few minutes, blaming 
a communication error with C-ICAP (running SquidClamAV).


In cache.log I see:

2021/01/04 14:24:24 kid1| suspending ICAP service for too many failures
2021/01/04 14:24:24 kid1| essential ICAP service is suspended: 
icap://127.0.0.1:1344/squidclamav [down,susp,fail11]


This happens usually once a day, always at the same time.
AFAIK there's no particular job running on the server at that time; I 
analyzed squid.log to see whether some client accesses something 
specific at that hour of the day, but came up empty.


Obviously I looked into C-ICAP logs, but, again, found no hint of any 
error or trouble.



Any suggestion on what to do to investigate this?

 bye & Thanks
av.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Squid "suspending ICAP service for too many failures"

2021-01-27 Thread Alex Rousskov
On 1/27/21 11:01 AM, Andrea Venturoli wrote:

>> 2021/01/04 14:24:24 kid1| suspending ICAP service for too many failures
>> 2021/01/04 14:24:24 kid1| essential ICAP service is suspended:
>> icap://127.0.0.1:1344/squidclamav [down,susp,fail11]

> This happens usually once a day, always at the same time.
> AFAIK there's no particular job running on the server at that time; I
> analyzed squid.log to see whether some client accesses something
> specific at that hour of the day, but came up empty.

Unfortunately, Squid ICAP client does not log some of the failures at
debugging level 0 or 1.


> Any suggestion on what to do to investigate this?

Enable ICAP debugging and study cache.log for relevant messages,
especially just before the "suspending ICAP service" message shown above.

debug_options ALL,1 93,7

Debugging will produce a lot of irrelevant to you cache.log lines If
necessary, you can enable debugging an hour (or even a minute!) before
the regular failure. This will allow you to detail the last failure (at
least). It is possible that all the 11 failures are the same.


HTH,

Alex.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Squid "suspending ICAP service for too many failures"

2021-01-29 Thread Andrea Venturoli

On 1/27/21 6:11 PM, Alex Rousskov wrote:


Enable ICAP debugging and study cache.log for relevant messages,
especially just before the "suspending ICAP service" message shown above.

 debug_options ALL,1 93,7


Thanks a lot.

As expected, I see Squid connections to C-ICAP starting to time out: 
when the number of errors reach 10, Squid marks squidclamav service as 
"suspended".


No big surprise. Still I don't get any more insight (Is C-ICAP choking? 
Why? What data triggers this?).




Is it a really bad idea to raise icap_connect_timeout?
Same for disabling icap_service_failure_limit?

Other hints?

 bye & Thanks
av.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Squid "suspending ICAP service for too many failures"

2021-01-29 Thread Alex Rousskov
On 1/29/21 11:55 AM, Andrea Venturoli wrote:

> I see Squid connections to C-ICAP starting to time out:
> when the number of errors reach 10, Squid marks squidclamav service as
> "suspended".

> No big surprise.

IIRC, you did not disclose timeout suspicions before. This explanation
is news to me, and it eliminates several suspects.


> Still I don't get any more insight (Is C-ICAP choking?
> Why? What data triggers this?).

If you are talking about Squid timing out when attempting to establish a
TCP connection with the ICAP server, then this may by as much insight as
you can get from the Squid side. There is no ICAP "data" at that
connection establishment stage. It is a fairly low-level operation that
Squid and c-icap have little control over. The problem is probably
outside Squid.

I do not know much about c-icap, but I would check whether its
configuration or something like crontab results in hourly restarts and
associated loss of connectivity. The network interface or the routing
tables might also be reset hourly for some reason. The ICAP
server/service might be running out of descriptors or memory.

One potentially useful test is to try to connect to the ICAP server
_while the problem is happening_ using telnet or netcat. When Squid
cannot establish a connection, can you? If the ICAP service is not
running on the Squid box, then try this test both from the Squid box and
from the ICAP box.

Packet captures can tell you whether other Squid-ICAP server connections
were active at the time, whether from-Squid SYN packets were able to
reach the ICAP server, etc.

In other words, basic network troubleshooting steps...


> Is it a really bad idea to raise icap_connect_timeout?

Higher timeout will delay HTTP client transactions for longer periods of
time, of course. If you want to go down the road of finding workarounds,
then check whether raising that timeout actually helps. It is not yet
clear (to me) whether the connections just need more time to be
established or are simply doomed.


> Same for disabling icap_service_failure_limit?

This is an essential ICAP service (icap_service bypass=off). I assume
there is no backup service -- no adaptation_service_set in play here. If
so, disabling the limit means that fewer HTTP transactions will be
inconvenienced in the long run than if the service were to be suspended.
 Hence, fewer ICAP errors will be delivered to Squid clients.

You can also enable bypass.

Fixing the problem would be a much better solution, of course.


HTH,

Alex.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Squid "suspending ICAP service for too many failures"

2021-01-30 Thread Andrea Venturoli

On 1/29/21 8:38 PM, Alex Rousskov wrote:


IIRC, you did not disclose timeout suspicions before. This explanation
is news to me, and it eliminates several suspects.


Sorry, I didn't say much in fact.
I gave for granted that it was C-ICAP who stopped answering; I didn't 
suspect a Squid bug and had no other idea.





If you are talking about Squid timing out when attempting to establish a
TCP connection with the ICAP server, then this may by as much insight as
you can get from the Squid side.


What I hoped to find in Squid logs was *what* was being passed to C-ICAP 
when it locked.

I'll try on the C-ICAP side then.




I do not know much about c-icap, but I would check whether its
configuration or something like crontab results in hourly restarts and
associated loss of connectivity.


AFAIK no.




The network interface or the routing tables might also be reset hourly


They live on the same host.




The ICAP server/service might be running out of descriptors or memory.


I'd expect it to log that, but I'll investigate better.




One potentially useful test is to try to connect to the ICAP server
_while the problem is happening_ using telnet or netcat. When Squid
cannot establish a connection, can you?


I'll try, but it's going to be hard, since this happens for a few 
minutes once a day at most.





Packet captures can tell you whether other Squid-ICAP server connections
were active at the time, whether from-Squid SYN packets were able to
reach the ICAP server, etc.

In other words, basic network troubleshooting steps...


As I said, they live on the same host, so it can't be a network problem.




Higher timeout will delay HTTP client transactions for longer periods of
time, of course. If you want to go down the road of finding workarounds,
then check whether raising that timeout actually helps. It is not yet
clear (to me) whether the connections just need more time to be
established or are simply doomed.


It's not clear to me either, but I suspect so, given the trouble only 
last a few minutes.






Same for disabling icap_service_failure_limit?


This is an essential ICAP service (icap_service bypass=off). I assume
there is no backup service -- no adaptation_service_set in play here. If
so, disabling the limit means that fewer HTTP transactions will be
inconvenienced in the long run than if the service were to be suspended.
  Hence, fewer ICAP errors will be delivered to Squid clients.


Agreed.




You can also enable bypass.


I guess this would open a potential for an attack.
DoS the service (antivirus), then let something nasty pass...




Fixing the problem would be a much better solution, of course.


Sure, I know these are workarounds and I'd rather avoid them, but I'll 
need to consider them as a last resort.




 bye & Thanks
av.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Squid "suspending ICAP service for too many failures"

2021-01-30 Thread Amos Jeffries

On 31/01/21 6:08 am, Andrea Venturoli wrote:

On 1/29/21 8:38 PM, Alex Rousskov wrote:



Packet captures can tell you whether other Squid-ICAP server connections
were active at the time, whether from-Squid SYN packets were able to
reach the ICAP server, etc.

In other words, basic network troubleshooting steps...


As I said, they live on the same host, so it can't be a network problem.



FYI, that conclusion does not follow. Even on the same host there is a 
full TCP/IP networking stack between Squid and ICAP server doing things 
to the packets. All localhost removes is the potential problems due to 
differences in machine networking stacks.


Network config, firewall rules, packet handling, and/or protocol 
negotiation activities between the software are all still happening that 
may affect the outcome.




Amos
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Squid "suspending ICAP service for too many failures"

2021-01-31 Thread Andrea Venturoli

On 1/31/21 1:11 AM, Amos Jeffries wrote:


As I said, they live on the same host, so it can't be a network problem.



FYI, that conclusion does not follow. Even on the same host there is a 
full TCP/IP networking stack between Squid and ICAP server doing things 
to the packets. All localhost removes is the potential problems due to 
differences in machine networking stacks.


Network config, firewall rules, packet handling, and/or protocol 
negotiation activities between the software are all still happening that 
may affect the outcome.


Right.
It could be a network problem.
However, I think that's unlikely (also given the host is monitored and I 
don't see alerts or other signs of such troubles).
While I cannot exclude that completely, I think I should first 
investigate in other directions.


 bye & Thanks
av.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] Squid "suspending ICAP service for too many failures"

2021-02-01 Thread Andrea Venturoli

On 2/1/21 8:56 AM, Andrea Venturoli wrote:


It could be a network problem.
However, I think that's unlikely (also given the host is monitored and I 
don't see alerts or other signs of such troubles).
While I cannot exclude that completely, I think I should first 
investigate in other directions.


Finally I have some insight: this happens when ClamAV receives a new 
virus definitions database and so reloads.


Notice I'm using 0.103, which "reloads the signature database without 
blocking scanning" (and no I didn't disable this).
So probably, while it works in theory, this slows the system and hence 
the timeouts.


I'm now trying with increased timeouts or with disabling ICAP failure 
limits.


Thanks to all who helped.

 bye
av.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users