[squid-users] Squid "suspending ICAP service for too many failures"
Hello. On a box I manage, Squids occasionally stops for a few minutes, blaming a communication error with C-ICAP (running SquidClamAV). In cache.log I see: 2021/01/04 14:24:24 kid1| suspending ICAP service for too many failures 2021/01/04 14:24:24 kid1| essential ICAP service is suspended: icap://127.0.0.1:1344/squidclamav [down,susp,fail11] This happens usually once a day, always at the same time. AFAIK there's no particular job running on the server at that time; I analyzed squid.log to see whether some client accesses something specific at that hour of the day, but came up empty. Obviously I looked into C-ICAP logs, but, again, found no hint of any error or trouble. Any suggestion on what to do to investigate this? bye & Thanks av. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] Squid "suspending ICAP service for too many failures"
On 1/27/21 11:01 AM, Andrea Venturoli wrote: >> 2021/01/04 14:24:24 kid1| suspending ICAP service for too many failures >> 2021/01/04 14:24:24 kid1| essential ICAP service is suspended: >> icap://127.0.0.1:1344/squidclamav [down,susp,fail11] > This happens usually once a day, always at the same time. > AFAIK there's no particular job running on the server at that time; I > analyzed squid.log to see whether some client accesses something > specific at that hour of the day, but came up empty. Unfortunately, Squid ICAP client does not log some of the failures at debugging level 0 or 1. > Any suggestion on what to do to investigate this? Enable ICAP debugging and study cache.log for relevant messages, especially just before the "suspending ICAP service" message shown above. debug_options ALL,1 93,7 Debugging will produce a lot of irrelevant to you cache.log lines If necessary, you can enable debugging an hour (or even a minute!) before the regular failure. This will allow you to detail the last failure (at least). It is possible that all the 11 failures are the same. HTH, Alex. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] Squid "suspending ICAP service for too many failures"
On 1/27/21 6:11 PM, Alex Rousskov wrote: Enable ICAP debugging and study cache.log for relevant messages, especially just before the "suspending ICAP service" message shown above. debug_options ALL,1 93,7 Thanks a lot. As expected, I see Squid connections to C-ICAP starting to time out: when the number of errors reach 10, Squid marks squidclamav service as "suspended". No big surprise. Still I don't get any more insight (Is C-ICAP choking? Why? What data triggers this?). Is it a really bad idea to raise icap_connect_timeout? Same for disabling icap_service_failure_limit? Other hints? bye & Thanks av. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] Squid "suspending ICAP service for too many failures"
On 1/29/21 11:55 AM, Andrea Venturoli wrote: > I see Squid connections to C-ICAP starting to time out: > when the number of errors reach 10, Squid marks squidclamav service as > "suspended". > No big surprise. IIRC, you did not disclose timeout suspicions before. This explanation is news to me, and it eliminates several suspects. > Still I don't get any more insight (Is C-ICAP choking? > Why? What data triggers this?). If you are talking about Squid timing out when attempting to establish a TCP connection with the ICAP server, then this may by as much insight as you can get from the Squid side. There is no ICAP "data" at that connection establishment stage. It is a fairly low-level operation that Squid and c-icap have little control over. The problem is probably outside Squid. I do not know much about c-icap, but I would check whether its configuration or something like crontab results in hourly restarts and associated loss of connectivity. The network interface or the routing tables might also be reset hourly for some reason. The ICAP server/service might be running out of descriptors or memory. One potentially useful test is to try to connect to the ICAP server _while the problem is happening_ using telnet or netcat. When Squid cannot establish a connection, can you? If the ICAP service is not running on the Squid box, then try this test both from the Squid box and from the ICAP box. Packet captures can tell you whether other Squid-ICAP server connections were active at the time, whether from-Squid SYN packets were able to reach the ICAP server, etc. In other words, basic network troubleshooting steps... > Is it a really bad idea to raise icap_connect_timeout? Higher timeout will delay HTTP client transactions for longer periods of time, of course. If you want to go down the road of finding workarounds, then check whether raising that timeout actually helps. It is not yet clear (to me) whether the connections just need more time to be established or are simply doomed. > Same for disabling icap_service_failure_limit? This is an essential ICAP service (icap_service bypass=off). I assume there is no backup service -- no adaptation_service_set in play here. If so, disabling the limit means that fewer HTTP transactions will be inconvenienced in the long run than if the service were to be suspended. Hence, fewer ICAP errors will be delivered to Squid clients. You can also enable bypass. Fixing the problem would be a much better solution, of course. HTH, Alex. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] Squid "suspending ICAP service for too many failures"
On 1/29/21 8:38 PM, Alex Rousskov wrote: IIRC, you did not disclose timeout suspicions before. This explanation is news to me, and it eliminates several suspects. Sorry, I didn't say much in fact. I gave for granted that it was C-ICAP who stopped answering; I didn't suspect a Squid bug and had no other idea. If you are talking about Squid timing out when attempting to establish a TCP connection with the ICAP server, then this may by as much insight as you can get from the Squid side. What I hoped to find in Squid logs was *what* was being passed to C-ICAP when it locked. I'll try on the C-ICAP side then. I do not know much about c-icap, but I would check whether its configuration or something like crontab results in hourly restarts and associated loss of connectivity. AFAIK no. The network interface or the routing tables might also be reset hourly They live on the same host. The ICAP server/service might be running out of descriptors or memory. I'd expect it to log that, but I'll investigate better. One potentially useful test is to try to connect to the ICAP server _while the problem is happening_ using telnet or netcat. When Squid cannot establish a connection, can you? I'll try, but it's going to be hard, since this happens for a few minutes once a day at most. Packet captures can tell you whether other Squid-ICAP server connections were active at the time, whether from-Squid SYN packets were able to reach the ICAP server, etc. In other words, basic network troubleshooting steps... As I said, they live on the same host, so it can't be a network problem. Higher timeout will delay HTTP client transactions for longer periods of time, of course. If you want to go down the road of finding workarounds, then check whether raising that timeout actually helps. It is not yet clear (to me) whether the connections just need more time to be established or are simply doomed. It's not clear to me either, but I suspect so, given the trouble only last a few minutes. Same for disabling icap_service_failure_limit? This is an essential ICAP service (icap_service bypass=off). I assume there is no backup service -- no adaptation_service_set in play here. If so, disabling the limit means that fewer HTTP transactions will be inconvenienced in the long run than if the service were to be suspended. Hence, fewer ICAP errors will be delivered to Squid clients. Agreed. You can also enable bypass. I guess this would open a potential for an attack. DoS the service (antivirus), then let something nasty pass... Fixing the problem would be a much better solution, of course. Sure, I know these are workarounds and I'd rather avoid them, but I'll need to consider them as a last resort. bye & Thanks av. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] Squid "suspending ICAP service for too many failures"
On 31/01/21 6:08 am, Andrea Venturoli wrote: On 1/29/21 8:38 PM, Alex Rousskov wrote: Packet captures can tell you whether other Squid-ICAP server connections were active at the time, whether from-Squid SYN packets were able to reach the ICAP server, etc. In other words, basic network troubleshooting steps... As I said, they live on the same host, so it can't be a network problem. FYI, that conclusion does not follow. Even on the same host there is a full TCP/IP networking stack between Squid and ICAP server doing things to the packets. All localhost removes is the potential problems due to differences in machine networking stacks. Network config, firewall rules, packet handling, and/or protocol negotiation activities between the software are all still happening that may affect the outcome. Amos ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] Squid "suspending ICAP service for too many failures"
On 1/31/21 1:11 AM, Amos Jeffries wrote: As I said, they live on the same host, so it can't be a network problem. FYI, that conclusion does not follow. Even on the same host there is a full TCP/IP networking stack between Squid and ICAP server doing things to the packets. All localhost removes is the potential problems due to differences in machine networking stacks. Network config, firewall rules, packet handling, and/or protocol negotiation activities between the software are all still happening that may affect the outcome. Right. It could be a network problem. However, I think that's unlikely (also given the host is monitored and I don't see alerts or other signs of such troubles). While I cannot exclude that completely, I think I should first investigate in other directions. bye & Thanks av. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] Squid "suspending ICAP service for too many failures"
On 2/1/21 8:56 AM, Andrea Venturoli wrote: It could be a network problem. However, I think that's unlikely (also given the host is monitored and I don't see alerts or other signs of such troubles). While I cannot exclude that completely, I think I should first investigate in other directions. Finally I have some insight: this happens when ClamAV receives a new virus definitions database and so reloads. Notice I'm using 0.103, which "reloads the signature database without blocking scanning" (and no I didn't disable this). So probably, while it works in theory, this slows the system and hence the timeouts. I'm now trying with increased timeouts or with disabling ICAP failure limits. Thanks to all who helped. bye av. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users