Re: [squid-users] connections from particular users sometimes get stuck
On 09/30/2016 12:38 AM, Eugene M. Zheganin wrote: > And the > main sign indicating there's something wrong with this initial > transaction was the fact that 407 answer took 42 seconds to appear in > both tcpdump captures. To avoid misunderstanding: There are many red flags in your logs, including excessive processing delays. I am ignoring the ones that do not appear to be related to the "stuck browser" problem. >>> tcpdump capture taken from a client machine: >>> http://zhegan.in/files/squid/squid-stuck-client.pcap >> This capture one is missing most of the second transaction packets >> (tcp.stream eq 186). I do not know why tcpdump was unable to collect them. > This is because of the gap between Ctrl^C issued in the client machine > cmd console and squid server ssh console (in this exact order) - I had > to switch between windows on my desktop, and because about a minute has > passed since the beginning of the transaction (42 seconds plus quite > some time), and I was worried that the debug log will grow more and > more, making it difficult to navigate then. Please do not worry about log navigation -- you are already posting logs with 95+% of irrelevant transactions; adding 2% more would make no difference. Focus on collecting complete logs. However, please _compress_ your logs before posting them. >> In summary, your browser is probably stuck because Squid could not >> accept a connection. Why did that accept call fail with ECONNABORTED? >> If you cancel browser wait and repeat the request, will it work? > Sometimes, but this means like 3-5% percents. This may be your pathway to the answer! A _random_ ECONNABORTED accept(2) error would have a ~100% probability of disappearing on the second attempt. Your observations seem to confirm that this error (assuming Squid gets ECONNABORTED for all stuck transactions) is caused by something on that client or something between that specific client and Squid. We obviously do not know what is wrong yet, but just for the sake of an example, consider a mismatching negotiated Ethernet settings that lead to packet loss and similar low-level problems that lead to TCP accept(2) errors from Squid point of view. Again, I am not claiming that this is what is going on. > Most of the time > rerequests lead to the same timeout, Chrome shows the request is > "pending" in the developer's tools/network tab for dozens of seconds, > and so on. Very good. This is something we can use to investigate. > I still have this machine in stuck state (I think), what should I focus on ? I recommend the following: 0. Start capturing to/from-Squid packets on the client machine. Make sure tcpdump does not do DNS resolution (if applicable). 1. Start capturing to/from-clientS packets on the Squid machine. Make sure tcpdump does not do DNS resolution (if applicable). 2. Start Squid debugging. 3. On both machines (if possible): Collect "netstat -s" or equivalent TCP stack stats. Collect "ifconfig ..." or equivalent high-level interface stats. Collect "ethtool -s ..." or equivalent low-level interface stats. 4. Reproduce the problem. Wait until Chrome times out. 5. On both machines (if possible): Collect "netstat -s" or equivalent TCP stack stats. Collect "ifconfig ..." or equivalent high-level interface stats. Collect "ethtool -s ..." or equivalent low-level interface stats. 6. Rereproduce the problem. Wait until Chrome times out. If possible, use a slightly different URL for this test. For example, append "?test2" to the URL in #4, assuming that will not screw things up. 7. On both machines (if possible): Collect "netstat -s" or equivalent TCP stack stats. Collect "ifconfig ..." or equivalent high-level interface stats. Collect "ethtool -s ..." or equivalent low-level interface stats. 8. Wait 60 seconds. 9. Stop all captures and Squid debugging. 10. Archive, compress, and share all the logs and the test URL(s). When archiving, please preserve file modification times for logs from steps 3,5,7. Adjust the procedure as needed, of course. Alex. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
On Sep 30, 2016, at 1:38 AM, Eugene M. Zheganin wrote: > > Hi. > >>> 13:31:25.060 kid1| accept failure: (53) Software caused connection abort >>> 13:31:25.865 kid1| accept failure: (53) Software caused connection abort >>> 13:31:25.904 kid2| accept failure: (53) Software caused connection abort >> The timestamps are a ~second off, but AFAICT, they are a ~second off for >> successful accepts as well, so it is probably just a tcpdump logging >> artifact. >> >> In summary, your browser is probably stuck because Squid could not >> accept a connection. Why did that accept call fail with ECONNABORTED? I >> cannot say for sure -- the packet trace is rather dirty/misleading >> (e.g., it shows the redirect packet being sent to the client _after_ the >> client follows that redirect which does not make sense). >> >> Any relevant errors in you system logs? > Nothing except > > Limiting closed port RST response from 294 to 200 packets/sec Sometimes I have noticed odd problems when this appears — I’m thinking that on occasion there may be a necessary RST that is being ignored and causing a client connection to “hang”. The “net.inet.icmp.icmplim” sysctl limits the ICMP response rate to RST packets received for ports that are not open — perhaps increase that from the default of 200 to 400 and see if it helps. Otherwise, are you using dynamic IPFW rules that involve the ports used by squid? I’ve had some issues with those in the past (timeouts or maximum numbers of rules exceeded), so I’ve switched to using static IPFW rules for connections in and out of squid to avoid issues. Guy ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
On 30/09/2016 7:38 p.m., Eugene M. Zheganin wrote: > On 29.09.2016 23:17, Alex Rousskov wrote: >> >> In summary, your browser is probably stuck because Squid could not >> accept a connection. Why did that accept call fail with ECONNABORTED? I >> cannot say for sure -- the packet trace is rather dirty/misleading >> (e.g., it shows the redirect packet being sent to the client _after_ the >> client follows that redirect which does not make sense). >> >> Any relevant errors in you system logs? > Nothing except > > Limiting closed port RST response from 294 to 200 packets/sec > > repeated by many times, which doesn't look related to this, because they > keep popping even when nobody's complaining. I've already did the > initial investigation and found no signs of resource starvation - > meaning no connection/files/mbufs/memory/pf states starvation happens. > At least I didn't find it. Plus, it would affect all of the clients > randomly, right ? Not this particular one. >> >> If you cancel browser wait and repeat the request, will it work? > Sometimes, but this means like 3-5% percents. Most of the time > rerequests lead to the same timeout, Chrome shows the request is > "pending" in the developer's tools/network tab for dozens of seconds, > and so on. >> If this >> was just a random accept failure, then it should work on the second try. >> If it does not work again, then there is something more serious going on >> (but you would need to collect more logs to study that). >> >> The connection accepting code in Squid is in poor shape, but I do not >> think those minor code problems affect this particular use case. >> > I still have this machine in stuck state (I think), what should I focus on ? I think the TCP packets, particularly the SYN and first few after that between the client and Squid. There may be some clues in the TCP features which they are trying to negotiate for use. Or the timings relative to when Squid calls accept() to receive the connection (TCP queues too big etc). Amos ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
Hi. On 29.09.2016 23:17, Alex Rousskov wrote: > On 09/29/2016 02:58 AM, Eugene M. Zheganin wrote: >> This time turbodom.ru entries are present in the debug log > Yes, there are two complete HTTP transactions with that domain. One is a > 407 Authentication Required and one is a 301 redirect: > >> HTTP/1.1 301 Moved Permanently > ... >> Location: http://turbodom.ru/index.html/ > I see no relevant problems with those two transactions. Me neither, but the original transaction start wasn't 'GET http://turbodom.ru/index.html/', it was 'GET http://turbodom.ru/index.html', without trailing slash. Trailing slash was added by the web-server, due to it's specific configuration. And the main sign indicating there's something wrong with this initial transaction was the fact that 407 answer took 42 seconds to appear in both tcpdump captures. > > >> tcpdump capture taken from a client machine: >> http://zhegan.in/files/squid/squid-stuck-client.pcap > This capture one is missing most of the second transaction packets > (tcp.stream eq 186). I do not know why tcpdump was unable to collect them. This is because of the gap between Ctrl^C issued in the client machine cmd console and squid server ssh console (in this exact order) - I had to switch between windows on my desktop, and because about a minute has passed since the beginning of the transaction (42 seconds plus quite some time), and I was worried that the debug log will grow more and more, making it difficult to navigate then. >> tcpdump capture taken from squid machine, on the interface the client >> machine is connected via: >> http://zhegan.in/files/squid/squid-stuck-server-to-client.pcap > This one has both HTTP transactions described above (tcp.stream eq 206). > > It also contains a related and incomplete transaction that follows the > above redirect (tcp.stream eq 346). According to tcpdump, that > transaction starts around 13:31:26. > > Squid cache log does not mention that third transaction (or that TCP > connection), probably because Squid could not accept it. There were a > few accept failures (ECONNABORTED) right around the time of that > third/missing transaction: > >> 13:31:25.060 kid1| accept failure: (53) Software caused connection abort >> 13:31:25.865 kid1| accept failure: (53) Software caused connection abort >> 13:31:25.904 kid2| accept failure: (53) Software caused connection abort > The timestamps are a ~second off, but AFAICT, they are a ~second off for > successful accepts as well, so it is probably just a tcpdump logging > artifact. > > In summary, your browser is probably stuck because Squid could not > accept a connection. Why did that accept call fail with ECONNABORTED? I > cannot say for sure -- the packet trace is rather dirty/misleading > (e.g., it shows the redirect packet being sent to the client _after_ the > client follows that redirect which does not make sense). > > Any relevant errors in you system logs? Nothing except Limiting closed port RST response from 294 to 200 packets/sec repeated by many times, which doesn't look related to this, because they keep popping even when nobody's complaining. I've already did the initial investigation and found no signs of resource starvation - meaning no connection/files/mbufs/memory/pf states starvation happens. At least I didn't find it. Plus, it would affect all of the clients randomly, right ? Not this particular one. > > If you cancel browser wait and repeat the request, will it work? Sometimes, but this means like 3-5% percents. Most of the time rerequests lead to the same timeout, Chrome shows the request is "pending" in the developer's tools/network tab for dozens of seconds, and so on. > If this > was just a random accept failure, then it should work on the second try. > If it does not work again, then there is something more serious going on > (but you would need to collect more logs to study that). > > The connection accepting code in Squid is in poor shape, but I do not > think those minor code problems affect this particular use case. > I still have this machine in stuck state (I think), what should I focus on ? Thanks. Eugene. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
On 09/29/2016 02:58 AM, Eugene M. Zheganin wrote: > This time turbodom.ru entries are present in the debug log Yes, there are two complete HTTP transactions with that domain. One is a 407 Authentication Required and one is a 301 redirect: > HTTP/1.1 301 Moved Permanently ... > Location: http://turbodom.ru/index.html/ I see no relevant problems with those two transactions. > tcpdump capture taken from a client machine: > http://zhegan.in/files/squid/squid-stuck-client.pcap This capture one is missing most of the second transaction packets (tcp.stream eq 186). I do not know why tcpdump was unable to collect them. > tcpdump capture taken from squid machine, on the interface the client > machine is connected via: > http://zhegan.in/files/squid/squid-stuck-server-to-client.pcap This one has both HTTP transactions described above (tcp.stream eq 206). It also contains a related and incomplete transaction that follows the above redirect (tcp.stream eq 346). According to tcpdump, that transaction starts around 13:31:26. Squid cache log does not mention that third transaction (or that TCP connection), probably because Squid could not accept it. There were a few accept failures (ECONNABORTED) right around the time of that third/missing transaction: > 13:31:25.060 kid1| accept failure: (53) Software caused connection abort > 13:31:25.865 kid1| accept failure: (53) Software caused connection abort > 13:31:25.904 kid2| accept failure: (53) Software caused connection abort The timestamps are a ~second off, but AFAICT, they are a ~second off for successful accepts as well, so it is probably just a tcpdump logging artifact. In summary, your browser is probably stuck because Squid could not accept a connection. Why did that accept call fail with ECONNABORTED? I cannot say for sure -- the packet trace is rather dirty/misleading (e.g., it shows the redirect packet being sent to the client _after_ the client follows that redirect which does not make sense). Any relevant errors in you system logs? If you cancel browser wait and repeat the request, will it work? If this was just a random accept failure, then it should work on the second try. If it does not work again, then there is something more serious going on (but you would need to collect more logs to study that). The connection accepting code in Squid is in poor shape, but I do not think those minor code problems affect this particular use case. Alex. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
Hi. On 29.09.2016 08:38, Eugene M. Zheganin wrote: > Hi. > > On 28.09.2016 21:21, Alex Rousskov wrote: >> >> Indeed! Fixing that exposes one HTTP request in the capture file. >> Unfortunately, >> >> 1. Squid responded to that request (with a 407 message). >> Follow (tcp.stream eq 32) in Wireshark. >> >> 2. Squid did not receive this request when debugging was on: >> $ egrep -c 'www.ru|217.112.35.75' cache.log.debug >> 0 > Yup, I noticed this too. >> It may be important to know that the captured request was received at >> minute 53 while Squid debugging starts at minute 58 (I assume the >> difference in hours is due to time zone effects and such). >> > The thing is, the client machine is in the AD domain and it's time is > NTP-synced, so is the time on the squid machine. I think I was > accurate, but mey be I still wasn't. I will redo the captures, this > time with squid-towards-the-LAN capture, and I will try to validate > them myself. Luckily, I didn't restart squid yet and this machine > still demonstrates the symptoms. Okay, second try. Same IPs, same browser; this time user opens http://turbodom.ru/index.html (www.ru/index.html got unstuck). Time is NTP-synced on both machines. This time turbodom.ru entries are present in the debug log (actually, there's no /index.html at the turbodom.ru, so anyone requesting http://turbodom.ru/index.html is me). debug log: http://zhegan.in/files/squid/cache.log.debug tcpdump capture taken from a client machine: http://zhegan.in/files/squid/squid-stuck-client.pcap tcpdump capture taken from squid machine, on the interface the client machine is connected via: http://zhegan.in/files/squid/squid-stuck-server-to-client.pcap Thanks. Eugene. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
Hi. On 28.09.2016 21:21, Alex Rousskov wrote: Indeed! Fixing that exposes one HTTP request in the capture file. Unfortunately, 1. Squid responded to that request (with a 407 message). Follow (tcp.stream eq 32) in Wireshark. 2. Squid did not receive this request when debugging was on: $ egrep -c 'www.ru|217.112.35.75' cache.log.debug 0 Yup, I noticed this too. It may be important to know that the captured request was received at minute 53 while Squid debugging starts at minute 58 (I assume the difference in hours is due to time zone effects and such). The thing is, the client machine is in the AD domain and it's time is NTP-synced, so is the time on the squid machine. I think I was accurate, but mey be I still wasn't. I will redo the captures, this time with squid-towards-the-LAN capture, and I will try to validate them myself. Luckily, I didn't restart squid yet and this machine still demonstrates the symptoms. Thanks. Eugene. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
On 09/28/2016 09:41 AM, Antony Stone wrote: > On Wednesday 28 September 2016 at 17:37:58, Alex Rousskov wrote: > >> AFAICT, Squid did not receive a request for www.ru: >>> $ egrep -c '.ru|217.112.35.75' cache.log.debug >>> 0 >>> >>> $ tshark -V -r squid-stuck-reference-client.pcap | egrep -c >>> '.ru|217.112.35.75' 0 > > Is that a direct copy'n'paste from your terminal? > > If so, you tried one too many w's :( Indeed! Fixing that exposes one HTTP request in the capture file. Unfortunately, 1. Squid responded to that request (with a 407 message). Follow (tcp.stream eq 32) in Wireshark. 2. Squid did not receive this request when debugging was on: $ egrep -c 'www.ru|217.112.35.75' cache.log.debug 0 It may be important to know that the captured request was received at minute 53 while Squid debugging starts at minute 58 (I assume the difference in hours is due to time zone effects and such). Another potentially important fact that it took almost two minutes for the 407 response to show up in the capture. It is not clear (to me) whether the delay was due to network problems (in one or both directions) or due to Squid. Again, follow (tcp.stream eq 32) in Wireshark to see the details. The "I do not know which transaction got stuck and whether that transaction go cache-logged" conclusion and suggestions in my original response still apply AFAICT. Thank you, Alex. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
On Wednesday 28 September 2016 at 17:37:58, Alex Rousskov wrote: > AFAICT, Squid did not receive a request for www.ru: > > $ egrep -c '.ru|217.112.35.75' cache.log.debug > > 0 > > > > $ tshark -V -r squid-stuck-reference-client.pcap | egrep -c > > '.ru|217.112.35.75' 0 Is that a direct copy'n'paste from your terminal? If so, you tried one too many w's :( Antony. -- "There has always been an underlying argument that we should open up our source code more broadly. The fact is that we are learning from open source and we are opening our code more broadly through Shared Source. Is there value to providing source code? The answer is unequivocally yes." - Jason Matusow, head of Microsoft's Shared Source Program, in response to leaks of Windows source code on the Internet. Please reply to the list; please *don't* CC me. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
On 09/28/2016 02:22 AM, Eugene M. Zheganin wrote: > I took the debug trace and both the tcpdump client-side and server-side > (towards the internet) capturea. ... > I requested a http://www.ru/index.html from a client machine Chrome. AFAICT, Squid did not receive a request for www.ru: > $ egrep -c '.ru|217.112.35.75' cache.log.debug > 0 > $ tshark -V -r squid-stuck-reference-client.pcap | egrep -c > '.ru|217.112.35.75' > 0 If you find that request in your client-to-Squid packet captures, please point me to it. Otherwise, please fix your captures and redo the test from scratch. Also, please include access.log and/or some other indication of which transactions got stuck. You may want to stop captures _after_ you cancel the stuck request -- it is more difficult to find "what has not happened" (no answer for X seconds) than "what has happened" (e.g., a TCP connection was reset). Keep in mind that, in general, the request for the URL you type in the browser may succeed, but the browser may get stuck getting some resource on that page. You may limit captures to TCP and DNS traffic to and from Squid. Finally, for the future, [if you cannot isolate the problem to one HTTP transaction(*),] it is best to capture everything that Squid receives or sends rather than trying to guess what the browser and Squid will receive or send -- according to cache.log, there were lots of Squid-server packets that your tcpdump configuration did not capture. Thank you, Alex. P.S. (*) Entering a regular page URL in the browser usually results in more than one HTTP transaction between the browser and Squid. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
Hi. On 28.09.2016 01:36, Alex Rousskov wrote: > On 09/27/2016 02:02 PM, Eugene M. Zheganin wrote: > >> I guess squid >> didn't get a way to increase debug level on the fly ? > "squid -k debug" (or sending an equivalent signal) does that: > http://wiki.squid-cache.org/SquidFaq/BugReporting#Detailed_Debug_Output > > You will not get ALL,9 this way, unfortunately, but ALL,7 might be enough. > > I took the debug trace and both the tcpdump client-side and server-side (towards the internet) capturea. Since the debug log is way heavy, I decided to put all of the three files on the web-server. Here they are: Squid debug log (ALL,7): http://zhegan.in/files/squid/cache.log.debug tcpdump client-side capture (windump -s 0 -w squid-stuck-reference-client.pcap -ni 1): http://zhegan.in/files/squid/squid-stuck-reference-client.pcap tcpdump server-side capture, towards the outer world, empty - obviously, server didn't send anything outside (tcpdump -s 0 -w squid-stuck-reference-server.pcap -ni vlan23 host 217.112.35.75): http://zhegan.in/files/squid/squid-stuck-reference-server.pcap Test sequence: client - 192.168.3.215 squid - 192.168.3.1:3128 URL - http://www.ru/index.html I requested a http://www.ru/index.html from a client machine Chrome. No other applications were requesting this URL at this time there (however, capture does contain a lot of traffic, including HTTP sessions). Then I waited about a minute (loader in Chrome was spinning), and aborted both captures, then aborted the request. The aborted request probably made it to the squid log. Eugene. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
On 09/27/2016 02:02 PM, Eugene M. Zheganin wrote: > I guess squid > didn't get a way to increase debug level on the fly ? "squid -k debug" (or sending an equivalent signal) does that: http://wiki.squid-cache.org/SquidFaq/BugReporting#Detailed_Debug_Output You will not get ALL,9 this way, unfortunately, but ALL,7 might be enough. Alex. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
Hi. On 28.09.2016 0:29, Alex Rousskov wrote: Since you can reproduce this, I suggest collecting ALL,9 log for the stuck master transaction: http://wiki.squid-cache.org/SquidFaq/BugReporting#Debugging_a_single_transaction If collecting a debugging trace is impossible for some reason, then collect the corresponding TCP packets on the Squid to origin server link and post actual packets (not screenshots of packet summaries) from both connections. The debugging trace will most likely have the answer. The packet trace might have the answer. You may need to change user credentials for this test or after posting the details requested above. Well... I cannot reproduce it on purpose, I'm just saying it is self-reproducible for almost a year, in certain moments of time. Collecting a debug trace isn't hard by itself, but I'm pretty sure the restart will clear this state for a current machine (I guess squid didn't get a way to increase debug level on the fly ? at least I'm not aware of it; so I will need to restart it to set ALL,9), and I'll have to run with ALL,9 for quite some time, which is, obviously, not good for production, because it will create enormous amounts of logging in cache log. So I will post the tcpdump containing both exchanges, and if it will be still unclear I'll think about running in a debug mode. Thanks. Eugene. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
On 09/27/2016 11:13 AM, Eugene M. Zheganin wrote: > The wireshark screen where the issue is isolated for one particular > connection can be found here - > https://gyazo.com/fdec1d9d7c31a75afc7d4676abb83d15 (it's really a simple > picture: TCP connection establishing, then GET -> 407 -> GET and bunch > of keepalives, not a rocket science). > > Any ideas ? Since you can reproduce this, I suggest collecting ALL,9 log for the stuck master transaction: http://wiki.squid-cache.org/SquidFaq/BugReporting#Debugging_a_single_transaction If collecting a debugging trace is impossible for some reason, then collect the corresponding TCP packets on the Squid to origin server link and post actual packets (not screenshots of packet summaries) from both connections. The debugging trace will most likely have the answer. The packet trace might have the answer. You may need to change user credentials for this test or after posting the details requested above. Cheers, Alex. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users