Re: [squid-dev] Rock store stopped accessing discs
On 03/17/2017 09:48 AM, Heiler Bemerguy wrote: > Sadly the same thing occurred again. It seems the hole is deeper lol.. Most likely, it is the same hole. However, the more we panic and jump to conclusions, the deeper that hole below us may look... > I couldn't find any previous messages that could give a clue why this is > happening.. > 2017/03/15 19:01:29 kid2| WARNING: communication with /cache2/rock may be too > slow or disrupted for about 7.00s; rescued 1 out of 1 I/Os > 2017/03/15 19:15:05 kid2| WARNING: communication with /cache2/rock may be too > slow or disrupted for about 7.00s; rescued 1 out of 1 I/Os > 2017/03/15 19:18:02 kid2| WARNING: communication with /cache2/rock may be too > slow or disrupted for about 7.00s; rescued 1 out of 1 I/Os > 2017/03/15 19:22:27 kid2| WARNING: communication with /cache2/rock may be too > slow or disrupted for about 7.00s; rescued 1 out of 1 I/Os > 2017/03/15 19:26:19 kid2| WARNING: communication with /cache2/rock may be too > slow or disrupted for about 7.00s; rescued 1 out of 1 I/Os In general, such semi-periodic problems without any other serious error messages could be due to missing max-swap-rate which is designed to prevent disk overload and associated process blockage by the OS. I doubt that is what happening in your particular case because I would expect more I/Os to be affected (but you can validate that theory by following my original recommendation). Can you reproduce this problem if you increase logging levels? If yes, I suggest configuring each kid to log to its own cache-N.log file and then use "-k debug" (or equivalent) to collect detailed logs containing a few of these WARNINGs. Cheers, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Rock store stopped accessing discs
I have not checked the code, but I suspect that * You see no timeout errors because you disabled timeouts. * You see no I/O delay warnings because you disabled I/O delays. Said that, I would not be surprised if disabling max-swap-rate reduces the probability of an I/O message getting stuck in the queue because without artificial message delays, the probability of a kid dying with a pending message is naturally lower. There are probably some bugs in the code even when no kids are dying, but there is currently no information warranting an investigation AFAICT. Sadly the same thing occurred again. It seems the hole is deeper lol.. max-swap-rate/swap-timeout are still disabled... I couldn't find any previous messages that could give a clue why this is happening.. :/ 2017/03/15 16:18:07 kid6| urlParse: Illegal hostname 'static..unocdn.com' 2017/03/15 16:18:07 kid4| urlParse: Illegal hostname 'static..unocdn.com' 2017/03/15 16:19:07 kid6| WARNING: Closing client connection due to lifetime timeout 2017/03/15 16:19:07 kid6| http://177.105.192.70:9000/oliberalcbn.mp3 2017/03/15 16:19:58 kid1| Weak ETags are not allowed in If-Range: "8083-1488088507000" ? "8083-1488088507000" 2017/03/15 16:20:02 kid1| WARNING: Closing client connection due to lifetime timeout 2017/03/15 16:20:02 kid1| http://45.43.200.26:27388/; 2017/03/15 16:20:51 kid6| WARNING: Closing client connection due to lifetime timeout 2017/03/15 16:20:51 kid6| http://45.43.200.26:27388/; 2017/03/15 16:22:43 kid1| ctx: enter level 0: 'http://storage.mais.uol.com.br/11848044.flv' 2017/03/15 16:22:43 kid1| WARNING: Malformed Content-Length[1]=- 2017/03/15 16:23:03 kid1| ctx: exit level 0 2017/03/15 16:23:03 kid1| ctx: enter level 0: 'http://storage.mais.uol.com.br/11848044.flv' 2017/03/15 16:23:03 kid1| WARNING: Malformed Content-Length[1]=- 2017/03/15 16:28:45 kid1| ctx: exit level 0 2017/03/15 16:28:45 kid1| SECURITY ALERT: Missing hostname in URL 'http://'. see access.log for details. 2017/03/15 16:33:26 kid2| SECURITY ALERT: Missing hostname in URL 'http://'. see access.log for details. 2017/03/15 16:33:27 kid3| SECURITY ALERT: Missing hostname in URL 'http://'. see access.log for details. 2017/03/15 16:37:44 kid4| ipcacheParse No Address records in response to 'a.sitemeter.com' 2017/03/15 16:37:44 kid4| ipcacheParse No Address records in response to 'a.sitemeter.com' 2017/03/15 16:43:17 kid3| WARNING: Closing client connection due to lifetime timeout 2017/03/15 16:43:17 kid3| http://www.superclipping.com.br/Uploads/video/15.03.2017/f17f8022f44148f4bc53d75db766b154.mp4 2017/03/15 16:46:44 kid4| local=10.1.10.9:3080 remote=10.1.3.242:54705 FD 1958 flags=1: read/write failure: (113) No route to host 2017/03/15 16:46:54 kid1| local=10.1.10.9:3080 remote=10.1.3.242:49770 FD 539 flags=1: read/write failure: (113) No route to host 2017/03/15 16:47:01 kid6| local=10.1.10.9:3080 remote=10.1.3.242:62607 FD 2158 flags=1: read/write failure: (113) No route to host 2017/03/15 16:47:47 kid6| local=10.1.10.9:3080 remote=10.1.3.242:62605 FD 2141 flags=1: read/write failure: (113) No route to host 2017/03/15 16:52:52 kid6| Starting new store_id helpers... 2017/03/15 16:52:52 kid6| helperOpenServers: Starting 1/50 'store-id.pl' processes 2017/03/15 16:52:52 kid6| Starting new store_id helpers... 2017/03/15 16:52:52 kid6| helperOpenServers: Starting 1/50 'store-id.pl' processes 2017/03/15 16:54:21 kid1| Weak ETags are not allowed in If-Range: "0" ? "0" 2017/03/15 16:55:31 kid1| Weak ETags are not allowed in If-Range: "54133-1387455809000" ? "54133-1387455809000" 2017/03/15 16:57:21 kid1| SECURITY ALERT: Missing hostname in URL 'http://'. see access.log for details. 2017/03/15 17:03:15 kid4| Weak ETags are not allowed in If-Range: "5878dd6c-4363" ? "5878dd6c-4363" 2017/03/15 17:30:03 kid1| SECURITY ALERT: Missing hostname in URL 'http://'. see access.log for details. 2017/03/15 17:57:31 kid1| WARNING: Closing client connection due to lifetime timeout 2017/03/15 17:57:31 kid1| http://liveupdate.gocyberlink.com/bigbang/ap/query_xml.jsp 2017/03/15 18:09:30 kid1| WARNING: Closing client connection due to lifetime timeout 2017/03/15 18:09:30 kid1| http://cloudav.downloads.pandasecurity.com/nano/pavsignano/nano_1/av_1/plug_01/plug_01_20170315_014235.rar 2017/03/15 18:20:49 kid1| urlParse: URL too large (10319 bytes) 2017/03/15 18:35:57 kid1| Weak ETags are not allowed in If-Range: "0" ? "0" 2017/03/15 18:36:50 kid4| WARNING: Closing client connection due to lifetime timeout 2017/03/15 18:36:50 kid4| http://au.ff.avast.com/android/avast-android-vps-v4-release.apk 2017/03/15 18:50:07 kid1| ipcacheParse No Address records in response to 'www.portaldoartesanato.com.br' 2017/03/15 18:50:07 kid1| ipcacheParse No Address records in response to 'www.portaldoartesanato.com.br' 2017/03/15 18:50:07 kid1| ipcacheParse No Address records in response to 'www.portaldoartesanato.com.br' 2017/03/15 18:50:07 kid
Re: [squid-dev] Rock store stopped accessing discs
On 03/15/2017 11:57 AM, Heiler Bemerguy wrote: > Commented out all the max-swap-rate and swap-timeout. Guess what? No > more errors on cache.log.. > It seems the code that "shapes" the rockstore I/Os is kinda buggy. I have not checked the code, but I suspect that * You see no timeout errors because you disabled timeouts. * You see no I/O delay warnings because you disabled I/O delays. Said that, I would not be surprised if disabling max-swap-rate reduces the probability of an I/O message getting stuck in the queue because without artificial message delays, the probability of a kid dying with a pending message is naturally lower. There are probably some bugs in the code even when no kids are dying, but there is currently no information warranting an investigation AFAICT. Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Rock store stopped accessing discs
Hello list, I've made a simple test here. cache_dir rock /cache2 11 min-size=0 max-size=65536 #max-swap-rate=150 swap-timeout=360 cache_dir rock /cache3 11 min-size=65537 max-size=262144 #max-swap-rate=150 swap-timeout=380 cache_dir rock /cache4 11 min-size=262145 #max-swap-rate=150 swap-timeout=500 Commented out all the max-swap-rate and swap-timeout. Guess what? No more errors on cache.log.. before: root@proxy:/var/log/squid# cat cache.log.2 |grep overflow |wc -l 53 now: root@proxy:/var/log/squid# cat cache.log |grep overflow |wc -l 1 before: root@proxy:/var/log/squid# cat cache.log.2 |grep "7.00s timeout" |wc -l 86 now: root@proxy:/var/log/squid# cat cache.log |grep "7.00s timeout" |wc -l 0 It seems the code that "shapes" the rockstore I/Os is kinda buggy. Version 4.0.18 -- Atenciosamente / Best Regards, Heiler Bemerguy Network Manager - CINBESA 55 91 98151-4894/3184-1751 Em 07/03/2017 20:26, Alex Rousskov escreveu: On 03/07/2017 01:08 PM, Heiler Bemerguy wrote: Some log from right now... Here is my analysis: 15:53:05.255| ipc/Queue.h findOldest: peeking from 7 to 6 at 1 Squid worker (kid6) is looking at the queue of disker (kid7) responses. There is just one response in the queue. IpcIoFile.cc canWait: cannot wait: 1136930911 oldest: ipcIo6.381049w7 Squid worker is trying to estimate whether it has enough time to queue (and eventually perform) more disk I/O. The expected wait is 1'136'930'911 milliseconds or ~13 days. That is longer than the configured cache_dir swap-timeout (a few hundred milliseconds) so Squid refuses to queue this disk I/O request: store_io.cc storeCreate: no swapdirs for e:=sw1p2RDV/0x206e17d0*4 The same story happens in your other log snippets AFAICT: 15:53:05.302| ipc/Queue.h findOldest: peeking from 8 to 6 at 1 IpcIoFile.cc canWait: cannot wait: 1136930958 oldest: ipcIo6.153009r8 store_io.cc storeCreate: no swapdirs for e:=msw1DV/0x3a4d5870*2 and 15:53:05.318| ipc/Queue.h findOldest: peeking from 7 to 6 at 1 IpcIoFile.cc canWait: cannot wait: 1136930974 oldest: ipcIo6.381049w7 store_io.cc storeCreate: no swapdirs for e:m381432=w1p2DV/0x3ace6740*4 and 15:53:05.793| ipc/Queue.h findOldest: peeking from 7 to 6 at 1 IpcIoFile.cc canWait: cannot wait: 1136931448 oldest: ipcIo6.381049w7 store_io.cc storeCreate: no swapdirs for e:=sw1V/0x206e17d0*1 Focusing on one disker (kid7), we can see that the oldest response does not change: It is always ipcIo6.381049w7. This stuck response results in gradual increment of the expected wait time with every check, matching wall clock time increment: 15:53:05.255| cannot wait: 1136930911 15:53:05.318| cannot wait: 1136930974 15:53:05.793| cannot wait: 1136931448 These stuck disker responses probably explain why your disks do not receive any traffic. It is potentially important that both disker responses shown in your logs got stuck at approximately the same absolute time ~13 days ago (around 2017-02-22, give or take a day; subtract 1136930911 milliseconds from 15:53:05.255 in your Squid time zone to know the "exact" time when those stuck requests were queued). How can a disker response get stuck? Most likely, something unusual happened ~13 days ago. This could be a Squid bug and/or a kid restart. * Do all currently running Squid kid processes have about the same start time? [1] * Do you see ipcIo6.381049w7 or ipcIo6.153009r8 mentioned in any old non-debugging messages/warnings? [1] http://stackoverflow.com/questions/5731234/how-to-get-the-start-time-of-a-long-running-linux-process Thank you, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Rock store stopped accessing discs
On 03/14/2017 10:43 AM, Heiler Bemerguy wrote: > Em 07/03/2017 20:26, Alex Rousskov escreveu: >> How can a disker response get stuck? Most likely, something unusual >> happened ~13 days ago. This could be a Squid bug and/or a kid restart. > root@proxy:~# ps auxw |grep squid- > proxy10225 0.0 0.0 13964224 21708 ? SMar10 0:10 > (squid-coord-10) -s > proxy10226 0.1 12.5 14737524 8268056 ?SMar10 7:14 > (squid-disk-9) -s > proxy10227 0.0 11.6 14737524 7686564 ?SMar10 3:08 > (squid-disk-8) -s > proxy10228 0.1 14.9 14737540 9863652 ?SMar10 7:30 > (squid-disk-7) -s > proxy18348 3.5 10.3 17157560 6859904 ?SMar13 48:44 (squid-6) -s > proxy18604 2.8 9.0 16903948 5977728 ?SMar13 37:28 (squid-4) -s > proxy18637 1.7 10.8 16836872 7163392 ?RMar13 23:03 (squid-1) -s > proxy20831 15.3 10.3 17226652 6838372 ?S08:50 39:51 (squid-2) -s > proxy21189 5.3 2.8 16538064 1871788 ?S12:29 2:12 (squid-5) -s > proxy21214 3.8 1.5 16448972 1012720 ?S12:43 1:03 (squid-3) -s > Diskers aren't dying but workers are, a lot.. I suspect that worker deaths may cause SMP queues to get stuck, but I have not validated that theory. We probably need to add more code to SMP queues so that they can recover from untimely kid deaths. > Another weird thing: lots of timeouts and overflows are happening on > non-active hours.. From 0h to 7h we have like 1-2% of the clients we > usually have from 8h to 17h.. (commercial time) If a queue is stuck, you will see these errors and warnings as long as there is some need for disk I/O. The volume is not important. > Still don't know how /cache3 stopped and /cache4 is still active, even > with all those warnings and errors.. :/ Do you expect Squid to function well in the presence of assertions and to explain what went wrong while asserting? Unfortunately, we are very far from that kind of robustness and self-diagnosis nirvana! I have not studied your error messages in detail, but it is possible that there are not-yet-stuck queues that feed cache4 while all cache3 queues are stuck. There is one SMP queue for each worker:disker pair. Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Rock store stopped accessing discs
Em 07/03/2017 20:26, Alex Rousskov escreveu: These stuck disker responses probably explain why your disks do not receive any traffic. It is potentially important that both disker responses shown in your logs got stuck at approximately the same absolute time ~13 days ago (around 2017-02-22, give or take a day; subtract 1136930911 milliseconds from 15:53:05.255 in your Squid time zone to know the "exact" time when those stuck requests were queued). How can a disker response get stuck? Most likely, something unusual happened ~13 days ago. This could be a Squid bug and/or a kid restart. * Do all currently running Squid kid processes have about the same start time? [1] * Do you see ipcIo6.381049w7 or ipcIo6.153009r8 mentioned in any old non-debugging messages/warnings? I searched the log files from those days, nothing unusual, "grep" returns nothing for ipcIo6.381049w7 or ipcIo6.153009r8. On that day I couldn't verify if the kids were still with the same uptime, I've reformatted those /cache2 /cache3 and /cache4 partitions and started fresh with squid -z, but looking at the PS right now, I feel I can answer that question: root@proxy:~# ps auxw |grep squid- proxy10225 0.0 0.0 13964224 21708 ? SMar10 0:10 (squid-coord-10) -s proxy10226 0.1 12.5 14737524 8268056 ?SMar10 7:14 (squid-disk-9) -s proxy10227 0.0 11.6 14737524 7686564 ?SMar10 3:08 (squid-disk-8) -s proxy10228 0.1 14.9 14737540 9863652 ?SMar10 7:30 (squid-disk-7) -s proxy18348 3.5 10.3 17157560 6859904 ?SMar13 48:44 (squid-6) -s proxy18604 2.8 9.0 16903948 5977728 ?SMar13 37:28 (squid-4) -s proxy18637 1.7 10.8 16836872 7163392 ?RMar13 23:03 (squid-1) -s proxy20831 15.3 10.3 17226652 6838372 ?S08:50 39:51 (squid-2) -s proxy21189 5.3 2.8 16538064 1871788 ?S12:29 2:12 (squid-5) -s proxy21214 3.8 1.5 16448972 1012720 ?S12:43 1:03 (squid-3) -s Diskers aren't dying but workers are, a lot.. with that "assertion failed: client_side_reply.cc:1167: http->storeEntry()->objectLen() >= headers_sz" thing. Viewing DF and IOSTAT, it seems right now /cache3 isn't being accessed anymore. (I think it is the disk-8 above, look at the CPU time usage..) Another weird thing: lots of timeouts and overflows are happening on non-active hours.. From 0h to 7h we have like 1-2% of the clients we usually have from 8h to 17h.. (commercial time) 2017/03/14 00:26:50 kid3| WARNING: abandoning 23 /cache4/rock I/Os after at least 7.00s timeout 2017/03/14 00:26:53 kid1| WARNING: abandoning 1 /cache4/rock I/Os after at least 7.00s timeout 2017/03/14 02:14:48 kid5| ERROR: worker I/O push queue for /cache4/rock overflow: ipcIo5.68259w9 2017/03/14 06:33:43 kid3| ERROR: worker I/O push queue for /cache4/rock overflow: ipcIo3.55919w9 2017/03/14 06:57:53 kid3| ERROR: worker I/O push queue for /cache4/rock overflow: ipcIo3.58130w9 This cache4 partition is where huge files would be stored: maximum_object_size 4 GB cache_dir rock /cache2 11 min-size=0 max-size=65536 max-swap-rate=150 swap-timeout=360 cache_dir rock /cache3 11 min-size=65537 max-size=262144 max-swap-rate=150 swap-timeout=380 cache_dir rock /cache4 11 min-size=262145 max-swap-rate=150 swap-timeout=500 Still don't know how /cache3 stopped and /cache4 is still active, even with all those warnings and errors.. :/ -- Atenciosamente / Best Regards, Heiler Bemerguy Network Manager - CINBESA 55 91 98151-4894/3184-1751 ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Rock store stopped accessing discs
On 03/07/2017 01:08 PM, Heiler Bemerguy wrote: > Some log from right now... Here is my analysis: > 15:53:05.255| ipc/Queue.h findOldest: peeking from 7 to 6 at 1 Squid worker (kid6) is looking at the queue of disker (kid7) responses. There is just one response in the queue. > IpcIoFile.cc canWait: cannot wait: 1136930911 oldest: ipcIo6.381049w7 Squid worker is trying to estimate whether it has enough time to queue (and eventually perform) more disk I/O. The expected wait is 1'136'930'911 milliseconds or ~13 days. That is longer than the configured cache_dir swap-timeout (a few hundred milliseconds) so Squid refuses to queue this disk I/O request: > store_io.cc storeCreate: no swapdirs for e:=sw1p2RDV/0x206e17d0*4 The same story happens in your other log snippets AFAICT: > 15:53:05.302| ipc/Queue.h findOldest: peeking from 8 to 6 at 1 > IpcIoFile.cc canWait: cannot wait: 1136930958 oldest: ipcIo6.153009r8 > store_io.cc storeCreate: no swapdirs for e:=msw1DV/0x3a4d5870*2 and > 15:53:05.318| ipc/Queue.h findOldest: peeking from 7 to 6 at 1 > IpcIoFile.cc canWait: cannot wait: 1136930974 oldest: ipcIo6.381049w7 > store_io.cc storeCreate: no swapdirs for e:m381432=w1p2DV/0x3ace6740*4 and > 15:53:05.793| ipc/Queue.h findOldest: peeking from 7 to 6 at 1 > IpcIoFile.cc canWait: cannot wait: 1136931448 oldest: ipcIo6.381049w7 > store_io.cc storeCreate: no swapdirs for e:=sw1V/0x206e17d0*1 Focusing on one disker (kid7), we can see that the oldest response does not change: It is always ipcIo6.381049w7. This stuck response results in gradual increment of the expected wait time with every check, matching wall clock time increment: > 15:53:05.255| cannot wait: 1136930911 > 15:53:05.318| cannot wait: 1136930974 > 15:53:05.793| cannot wait: 1136931448 These stuck disker responses probably explain why your disks do not receive any traffic. It is potentially important that both disker responses shown in your logs got stuck at approximately the same absolute time ~13 days ago (around 2017-02-22, give or take a day; subtract 1136930911 milliseconds from 15:53:05.255 in your Squid time zone to know the "exact" time when those stuck requests were queued). How can a disker response get stuck? Most likely, something unusual happened ~13 days ago. This could be a Squid bug and/or a kid restart. * Do all currently running Squid kid processes have about the same start time? [1] * Do you see ipcIo6.381049w7 or ipcIo6.153009r8 mentioned in any old non-debugging messages/warnings? [1] http://stackoverflow.com/questions/5731234/how-to-get-the-start-time-of-a-long-running-linux-process Thank you, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev
Re: [squid-dev] Rock store stopped accessing discs
Em 07/03/2017 15:22, Alex Rousskov escreveu: 2017/02/18 13:48:19 kid3| ERROR: worker I/O push queue for /cache4/rock overflow: ipcIo3.9082w9 2017/02/18 13:48:42 kid4| ERROR: worker I/O push queue for /cache4/rock overflow: ipcIo4.3371w9 2017/02/18 14:06:01 kid9| WARNING: /cache4/rock delays I/O requests for 9.97 seconds to obey 200/sec rate limit 2017/02/18 14:06:34 kid9| WARNING: /cache4/rock delays I/O requests for 21.82 seconds to obey 200/sec rate limit 2017/02/18 14:06:42 kid4| WARNING: abandoning 1 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 14:06:47 kid3| WARNING: abandoning 1 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 14:06:48 kid1| WARNING: abandoning 1 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 14:06:49 kid4| WARNING: abandoning 4 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 14:06:54 kid3| WARNING: abandoning 2 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 14:07:55 kid9| WARNING: /cache4/rock delays I/O requests for 68.64 seconds to obey 200/sec rate limit 2017/02/18 14:08:03 kid5| WARNING: abandoning 511 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 14:08:47 kid2| WARNING: abandoning 20 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 14:08:51 kid3| WARNING: abandoning 41 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 14:08:54 kid1| WARNING: abandoning 41 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 15:26:35 kid5| ERROR: worker I/O push queue for /cache4/rock overflow: ipcIo5.31404w9 2017/02/18 15:29:00 kid9| WARNING: /cache4/rock delays I/O requests for 9.92 seconds to obey 200/sec rate limit 2017/02/18 15:29:13 kid9| WARNING: /cache4/rock delays I/O requests for 8.23 seconds to obey 200/sec rate limit 2017/02/18 15:29:45 kid9| WARNING: /cache4/rock delays I/O requests for 8.86 seconds to obey 200/sec rate limit 2017/02/18 15:30:06 kid9| WARNING: /cache4/rock delays I/O requests for 7.34 seconds to obey 200/sec rate limit 2017/02/18 15:30:27 kid9| WARNING: /cache4/rock delays I/O requests for 7.65 seconds to obey 200/sec rate limit 2017/02/18 15:30:48 kid9| WARNING: /cache4/rock delays I/O requests for 8.97 seconds to obey 200/sec rate limit 2017/02/18 15:31:09 kid9| WARNING: /cache4/rock delays I/O requests for 8.52 seconds to obey 200/sec rate limit 2017/02/18 15:31:22 kid9| WARNING: /cache4/rock delays I/O requests for 10.61 seconds to obey 200/sec rate limit 2017/02/18 17:19:40 kid9| WARNING: /cache4/rock delays I/O requests for 10.22 seconds to obey 200/sec rate limit There is a Squid bug and/or your cache disks could not keep up with the I/O load. Please note that I/O load during initial cache index rebuild is higher. There are several ways to proceed IMO: 1. Figure out why your Squid is not using the disk cache _now_. This will require enabling debugging, at least for a few seconds, and then analyzing cache.log. I recommend enabling debugging in one worker kid only (i.e., sending its process the right signal instead of running "squid -k debug"). See Squid wiki for debugging details. Do not restart or reconfigure Squid until you collect those debugging logs! Please note that the logs may contain private user info. Some log from right now... 2017/03/07 15:53:05.255 kid6| 19,4| MemObject.cc(431) isContiguous: MemObject::isContiguous: Returning true 2017/03/07 15:53:05.255 kid6| 73,3| HttpRequest.cc(658) storeId: sent back effectiveRequestUrl: http://www.agenciabelem.com.br/signalr/hubs 2017/03/07 15:53:05.255 kid6| 20,3| store_swapmeta.cc(52) storeSwapMetaBuild: storeSwapMetaBuild URL: http://www.agenciabelem.com.br/signalr/hubs 2017/03/07 15:53:05.255 kid6| 54,2| ../../../src/ipc/Queue.h(490) findOldest: peeking from 7 to 6 at 1 2017/03/07 15:53:05.255 kid6| 47,2| IpcIoFile.cc(415) canWait: cannot wait: 1136930911 oldest: ipcIo6.381049w7 2017/03/07 15:53:05.255 kid6| 20,2| store_io.cc(38) storeCreate: storeCreate: no swapdirs for e:=sw1p2RDV/0x206e17d0*4 2017/03/07 15:53:05.255 kid6| 90,3| store_client.cc(729) invokeHandlers: InvokeHandlers: 69538C9E111C077214004DE73D0F257B 2017/03/07 15:53:05.255 kid6| 90,3| store_client.cc(735) invokeHandlers: StoreEntry::InvokeHandlers: checking client #0 2017/03/07 15:53:05.255 kid6| 90,3| store_client.cc(755) storePendingNClients: storePendingNClients: returning 1 2017/03/07 15:53:05.255 kid6| 20,3| store.cc(494) unlock: Client unlocking key 69538C9E111C077214004DE73D0F257B e:=sp2RDV/0x206e17d0*4 2017/03/07 15:53:05.255 kid6| 17,3| FwdState.cc(274) ~FwdState: FwdState destructor start -- 2017/03/07 15:53:05.302 kid6| 90,3| store_client.cc(661) storeUnregister: storeUnregister: called for 'C8303E098834088226809BF615D12C51' 2017/03/07 15:53:05.302 kid6| 19,4| MemObject.cc(431) isContiguous: MemObject::isContiguous: Returning true 2017/03/07 15:53:05.302 kid6| 20,3| store_swapmeta.cc(52) storeSwapMetaBuild: storeSwapMetaBuild URL: http://www.ormnews.com.br/library/bootstrap/css/bootstrap.m
Re: [squid-dev] Rock store stopped accessing discs
On 03/07/2017 10:58 AM, Heiler Bemerguy wrote: > I used iostat to check if "right now" the hds were being accessed. A > lot of minutes passed and all writes/reads remained Zero. Understood. > With a 80mbit/s traffic going on, how could nothing be written nor read from > disc? I can come up with several possible explanations, but more information is needed to minimize guessing. See below for some suggestions. > How can I check IPC RAM? I've never tweaked it. Search Squid wiki for SMP troubleshooting hints and/or ask your local sysadmin. If you cannot find anything relevant, ask here again and, hopefully somebody will be able to guide you. > I've just noticed that squid was running since feb/18 (Start Time: > Sat, 18 Feb 2017 15:38:44 GMT) and since the beginning there were a lot > of warnings on cache.log.. (The logs I pasted on the earlier email was > from today's usage only..) > I think since then, it stopped using the cache stores.. I assume "then" means "today's usage" not "feb/18". > 2017/02/18 13:48:19 kid3| ERROR: worker I/O push queue for /cache4/rock > overflow: ipcIo3.9082w9 > 2017/02/18 13:48:42 kid4| ERROR: worker I/O push queue for /cache4/rock > overflow: ipcIo4.3371w9 > 2017/02/18 14:06:01 kid9| WARNING: /cache4/rock delays I/O requests for 9.97 > seconds to obey 200/sec rate limit > 2017/02/18 14:06:34 kid9| WARNING: /cache4/rock delays I/O requests for 21.82 > seconds to obey 200/sec rate limit > 2017/02/18 14:06:42 kid4| WARNING: abandoning 1 /cache4/rock I/Os after at > least 7.00s timeout > 2017/02/18 14:06:47 kid3| WARNING: abandoning 1 /cache4/rock I/Os after at > least 7.00s timeout > 2017/02/18 14:06:48 kid1| WARNING: abandoning 1 /cache4/rock I/Os after at > least 7.00s timeout > 2017/02/18 14:06:49 kid4| WARNING: abandoning 4 /cache4/rock I/Os after at > least 7.00s timeout > 2017/02/18 14:06:54 kid3| WARNING: abandoning 2 /cache4/rock I/Os after at > least 7.00s timeout > 2017/02/18 14:07:55 kid9| WARNING: /cache4/rock delays I/O requests for 68.64 > seconds to obey 200/sec rate limit > 2017/02/18 14:08:03 kid5| WARNING: abandoning 511 /cache4/rock I/Os after at > least 7.00s timeout > 2017/02/18 14:08:47 kid2| WARNING: abandoning 20 /cache4/rock I/Os after at > least 7.00s timeout > 2017/02/18 14:08:51 kid3| WARNING: abandoning 41 /cache4/rock I/Os after at > least 7.00s timeout > 2017/02/18 14:08:54 kid1| WARNING: abandoning 41 /cache4/rock I/Os after at > least 7.00s timeout > 2017/02/18 15:26:35 kid5| ERROR: worker I/O push queue for /cache4/rock > overflow: ipcIo5.31404w9 > 2017/02/18 15:29:00 kid9| WARNING: /cache4/rock delays I/O requests for 9.92 > seconds to obey 200/sec rate limit > 2017/02/18 15:29:13 kid9| WARNING: /cache4/rock delays I/O requests for 8.23 > seconds to obey 200/sec rate limit > 2017/02/18 15:29:45 kid9| WARNING: /cache4/rock delays I/O requests for 8.86 > seconds to obey 200/sec rate limit > 2017/02/18 15:30:06 kid9| WARNING: /cache4/rock delays I/O requests for 7.34 > seconds to obey 200/sec rate limit > 2017/02/18 15:30:27 kid9| WARNING: /cache4/rock delays I/O requests for 7.65 > seconds to obey 200/sec rate limit > 2017/02/18 15:30:48 kid9| WARNING: /cache4/rock delays I/O requests for 8.97 > seconds to obey 200/sec rate limit > 2017/02/18 15:31:09 kid9| WARNING: /cache4/rock delays I/O requests for 8.52 > seconds to obey 200/sec rate limit > 2017/02/18 15:31:22 kid9| WARNING: /cache4/rock delays I/O requests for 10.61 > seconds to obey 200/sec rate limit > 2017/02/18 17:19:40 kid9| WARNING: /cache4/rock delays I/O requests for 10.22 > seconds to obey 200/sec rate limit There is a Squid bug and/or your cache disks could not keep up with the I/O load. Please note that I/O load during initial cache index rebuild is higher. There are several ways to proceed IMO: 1. Figure out why your Squid is not using the disk cache _now_. This will require enabling debugging, at least for a few seconds, and then analyzing cache.log. I recommend enabling debugging in one worker kid only (i.e., sending its process the right signal instead of running "squid -k debug"). See Squid wiki for debugging details. Do not restart or reconfigure Squid until you collect those debugging logs! Please note that the logs may contain private user info. 2. Reducing max-swap-rate from 200 to, say, 20. If your disks cannot keep up _and_ there is a Squid bug that screws something up when your disks cannot keep up, then this blind configuration change may avoid triggering that bug. 3. Collect enough iostat 5-second outputs (or equivalent) to correlate system performance with cache.log messages. I would also collect other system activity during those hours. The "atop" tool may be useful for collecting everything in one place. You will probably want to restart Squid for a clean experiment/collection. HTH, Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org
Re: [squid-dev] Rock store stopped accessing discs
Em 07/03/2017 13:14, Alex Rousskov escreveu: On 03/07/2017 08:40 AM, Heiler Bemerguy wrote: I'm using squid 4.0.18 And noticed something: (iostat -x 5) Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0,00 0,000,000,25 0,00 28,00 224,00 0,00 8,000,008,00 8,00 0,20 sdc 0,00 0,000,000,00 0,00 0,000,00 0,00 0,000,000,00 0,00 0,00 sdb 0,00 0,000,000,00 0,00 0,000,00 0,00 0,000,000,00 0,00 0,00 sdd 0,00 0,000,000,00 0,00 0,000,00 0,00 0,000,000,00 0,00 0,00 No hds are being accessed, only the main (SDA) one (which logs are saved). Btw squid is sending 80mbit/s to the network, as iftop told me. cache.log: 2017/03/07 05:23:59 kid5| ERROR: worker I/O push queue for /cache4/rock overflow: ipcIo5.206991w9 2017/03/07 05:24:10 kid5| WARNING: communication with /cache4/rock may be too slow or disrupted for about 7.00s; rescued 304 out of 304 I/Os 2017/03/07 08:00:30 kid5| WARNING: abandoning 1 /cache2/rock I/Os after at least 7.00s timeout 2017/03/07 10:50:45 kid5| WARNING: abandoning 1 /cache2/rock I/Os after at least 7.00s timeout I presume your iostat output covers 5 seconds. The cache.log output spans 5 hours. Were there no cache disk traffic during those 5 hours? Do those iostat 5 seconds match the timestamp of any single cache.log WARNING? No. I used iostat to check if "right now" the hds were being accessed. A lot of minutes passed and all writes/reads remained Zero. With a 80mbit/s traffic going on, how could nothing be written nor read from disc? It's like squid stopped using the cache_dirs for some reason, then I greped the cache.log for the word 'rock", and that's what it outputted squid.conf: cache_dir rock /cache2 11 min-size=0 max-size=65536 max-swap-rate=200 swap-timeout=360 cache_dir rock /cache3 11 min-size=65537 max-size=262144 max-swap-rate=200 swap-timeout=380 cache_dir rock /cache4 11 min-size=262145 max-swap-rate=200 swap-timeout=500 Should I raise any values? tweak something? Yes, but it is not yet clear what. If you suspect that your disks cannot handle the load, decrease max-swap-rate. However, there is currently no firm evidence that your disks cannot handle the load. It could be something else like insufficient IPC RAM or Squid bugs. How can I check IPC RAM? I've never tweaked it. Any Squid kid crashes? How many Squid workers do you use? Can you collect enough iostat 5-second outputs to correlate with long-term cache.log messages? I would also collect other system activity during those hours. The "atop" tool may be useful for collecting everything in one place. Couldn't find any crashes on the log file. 6 workers and 3 cache_dirs. I've just noticed that squid was running since feb/18 (Start Time: Sat, 18 Feb 2017 15:38:44 GMT) and since the beginning there were a lot of warnings on cache.log.. (The logs I pasted on the earlier email was from today's usage only..) I think since then, it stopped using the cache stores.. 2017/02/18 13:48:19 kid3| ERROR: worker I/O push queue for /cache4/rock overflow: ipcIo3.9082w9 2017/02/18 13:48:42 kid4| ERROR: worker I/O push queue for /cache4/rock overflow: ipcIo4.3371w9 2017/02/18 14:06:01 kid9| WARNING: /cache4/rock delays I/O requests for 9.97 seconds to obey 200/sec rate limit 2017/02/18 14:06:34 kid9| WARNING: /cache4/rock delays I/O requests for 21.82 seconds to obey 200/sec rate limit 2017/02/18 14:06:42 kid4| WARNING: abandoning 1 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 14:06:47 kid3| WARNING: abandoning 1 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 14:06:48 kid1| WARNING: abandoning 1 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 14:06:49 kid4| WARNING: abandoning 4 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 14:06:54 kid3| WARNING: abandoning 2 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 14:07:55 kid9| WARNING: /cache4/rock delays I/O requests for 68.64 seconds to obey 200/sec rate limit 2017/02/18 14:08:03 kid5| WARNING: abandoning 511 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 14:08:47 kid2| WARNING: abandoning 20 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 14:08:51 kid3| WARNING: abandoning 41 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 14:08:54 kid1| WARNING: abandoning 41 /cache4/rock I/Os after at least 7.00s timeout 2017/02/18 15:26:35 kid5| ERROR: worker I/O push queue for /cache4/rock overflow: ipcIo5.31404w9 2017/02/18 15:29:00 kid9| WARNING: /cache4/rock delays I/O requests for 9.92 seconds to obey 200/sec rate limit 2017/02/18 15:29:13 kid9| WARNING: /cache4/rock delays I/O requests for 8.23 seconds to obey 200/sec rate limit 2017/02/18 15:29:45 kid9| WARNING: /cache4/
Re: [squid-dev] Rock store stopped accessing discs
On 03/07/2017 08:40 AM, Heiler Bemerguy wrote: > I'm using squid 4.0.18 > > And noticed something: (iostat -x 5) > > Device: rrqm/s wrqm/s r/s w/srkB/swkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > sda 0,00 0,000,000,25 0,00 28,00 224,00 > 0,008,000,008,00 8,00 0,20 > sdc 0,00 0,000,000,00 0,00 0,000,00 > 0,000,000,000,00 0,00 0,00 > sdb 0,00 0,000,000,00 0,00 0,000,00 > 0,000,000,000,00 0,00 0,00 > sdd 0,00 0,000,000,00 0,00 0,000,00 > 0,000,000,000,00 0,00 0,00 > > No hds are being accessed, only the main (SDA) one (which logs are > saved). Btw squid is sending 80mbit/s to the network, as iftop told me. > > cache.log: > > 2017/03/07 05:23:59 kid5| ERROR: worker I/O push queue for /cache4/rock > overflow: ipcIo5.206991w9 > 2017/03/07 05:24:10 kid5| WARNING: communication with /cache4/rock may be too > slow or disrupted for about 7.00s; rescued 304 out of 304 I/Os > 2017/03/07 08:00:30 kid5| WARNING: abandoning 1 /cache2/rock I/Os after at > least 7.00s timeout > 2017/03/07 10:50:45 kid5| WARNING: abandoning 1 /cache2/rock I/Os after at > least 7.00s timeout I presume your iostat output covers 5 seconds. The cache.log output spans 5 hours. Were there no cache disk traffic during those 5 hours? Do those iostat 5 seconds match the timestamp of any single cache.log WARNING? > squid.conf: > > cache_dir rock /cache2 11 min-size=0 max-size=65536 max-swap-rate=200 > swap-timeout=360 > cache_dir rock /cache3 11 min-size=65537 max-size=262144 > max-swap-rate=200 swap-timeout=380 > cache_dir rock /cache4 11 min-size=262145 max-swap-rate=200 > swap-timeout=500 > > Should I raise any values? tweak something? Yes, but it is not yet clear what. If you suspect that your disks cannot handle the load, decrease max-swap-rate. However, there is currently no firm evidence that your disks cannot handle the load. It could be something else like insufficient IPC RAM or Squid bugs. Any Squid kid crashes? How many Squid workers do you use? Can you collect enough iostat 5-second outputs to correlate with long-term cache.log messages? I would also collect other system activity during those hours. The "atop" tool may be useful for collecting everything in one place. Alex. ___ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev