Re: [squid-users] connections from particular users sometimes get stuck
On 09/30/2016 12:38 AM, Eugene M. Zheganin wrote: > And the > main sign indicating there's something wrong with this initial > transaction was the fact that 407 answer took 42 seconds to appear in > both tcpdump captures. To avoid misunderstanding: There are many red flags in your logs, including excessive processing delays. I am ignoring the ones that do not appear to be related to the "stuck browser" problem. >>> tcpdump capture taken from a client machine: >>> http://zhegan.in/files/squid/squid-stuck-client.pcap >> This capture one is missing most of the second transaction packets >> (tcp.stream eq 186). I do not know why tcpdump was unable to collect them. > This is because of the gap between Ctrl^C issued in the client machine > cmd console and squid server ssh console (in this exact order) - I had > to switch between windows on my desktop, and because about a minute has > passed since the beginning of the transaction (42 seconds plus quite > some time), and I was worried that the debug log will grow more and > more, making it difficult to navigate then. Please do not worry about log navigation -- you are already posting logs with 95+% of irrelevant transactions; adding 2% more would make no difference. Focus on collecting complete logs. However, please _compress_ your logs before posting them. >> In summary, your browser is probably stuck because Squid could not >> accept a connection. Why did that accept call fail with ECONNABORTED? >> If you cancel browser wait and repeat the request, will it work? > Sometimes, but this means like 3-5% percents. This may be your pathway to the answer! A _random_ ECONNABORTED accept(2) error would have a ~100% probability of disappearing on the second attempt. Your observations seem to confirm that this error (assuming Squid gets ECONNABORTED for all stuck transactions) is caused by something on that client or something between that specific client and Squid. We obviously do not know what is wrong yet, but just for the sake of an example, consider a mismatching negotiated Ethernet settings that lead to packet loss and similar low-level problems that lead to TCP accept(2) errors from Squid point of view. Again, I am not claiming that this is what is going on. > Most of the time > rerequests lead to the same timeout, Chrome shows the request is > "pending" in the developer's tools/network tab for dozens of seconds, > and so on. Very good. This is something we can use to investigate. > I still have this machine in stuck state (I think), what should I focus on ? I recommend the following: 0. Start capturing to/from-Squid packets on the client machine. Make sure tcpdump does not do DNS resolution (if applicable). 1. Start capturing to/from-clientS packets on the Squid machine. Make sure tcpdump does not do DNS resolution (if applicable). 2. Start Squid debugging. 3. On both machines (if possible): Collect "netstat -s" or equivalent TCP stack stats. Collect "ifconfig ..." or equivalent high-level interface stats. Collect "ethtool -s ..." or equivalent low-level interface stats. 4. Reproduce the problem. Wait until Chrome times out. 5. On both machines (if possible): Collect "netstat -s" or equivalent TCP stack stats. Collect "ifconfig ..." or equivalent high-level interface stats. Collect "ethtool -s ..." or equivalent low-level interface stats. 6. Rereproduce the problem. Wait until Chrome times out. If possible, use a slightly different URL for this test. For example, append "?test2" to the URL in #4, assuming that will not screw things up. 7. On both machines (if possible): Collect "netstat -s" or equivalent TCP stack stats. Collect "ifconfig ..." or equivalent high-level interface stats. Collect "ethtool -s ..." or equivalent low-level interface stats. 8. Wait 60 seconds. 9. Stop all captures and Squid debugging. 10. Archive, compress, and share all the logs and the test URL(s). When archiving, please preserve file modification times for logs from steps 3,5,7. Adjust the procedure as needed, of course. Alex. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
On Sep 30, 2016, at 1:38 AM, Eugene M. Zheganinwrote: > > Hi. > >>> 13:31:25.060 kid1| accept failure: (53) Software caused connection abort >>> 13:31:25.865 kid1| accept failure: (53) Software caused connection abort >>> 13:31:25.904 kid2| accept failure: (53) Software caused connection abort >> The timestamps are a ~second off, but AFAICT, they are a ~second off for >> successful accepts as well, so it is probably just a tcpdump logging >> artifact. >> >> In summary, your browser is probably stuck because Squid could not >> accept a connection. Why did that accept call fail with ECONNABORTED? I >> cannot say for sure -- the packet trace is rather dirty/misleading >> (e.g., it shows the redirect packet being sent to the client _after_ the >> client follows that redirect which does not make sense). >> >> Any relevant errors in you system logs? > Nothing except > > Limiting closed port RST response from 294 to 200 packets/sec Sometimes I have noticed odd problems when this appears — I’m thinking that on occasion there may be a necessary RST that is being ignored and causing a client connection to “hang”. The “net.inet.icmp.icmplim” sysctl limits the ICMP response rate to RST packets received for ports that are not open — perhaps increase that from the default of 200 to 400 and see if it helps. Otherwise, are you using dynamic IPFW rules that involve the ports used by squid? I’ve had some issues with those in the past (timeouts or maximum numbers of rules exceeded), so I’ve switched to using static IPFW rules for connections in and out of squid to avoid issues. Guy ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
On 09/28/2016 09:41 AM, Antony Stone wrote: > On Wednesday 28 September 2016 at 17:37:58, Alex Rousskov wrote: > >> AFAICT, Squid did not receive a request for www.ru: >>> $ egrep -c '.ru|217.112.35.75' cache.log.debug >>> 0 >>> >>> $ tshark -V -r squid-stuck-reference-client.pcap | egrep -c >>> '.ru|217.112.35.75' 0 > > Is that a direct copy'n'paste from your terminal? > > If so, you tried one too many w's :( Indeed! Fixing that exposes one HTTP request in the capture file. Unfortunately, 1. Squid responded to that request (with a 407 message). Follow (tcp.stream eq 32) in Wireshark. 2. Squid did not receive this request when debugging was on: $ egrep -c 'www.ru|217.112.35.75' cache.log.debug 0 It may be important to know that the captured request was received at minute 53 while Squid debugging starts at minute 58 (I assume the difference in hours is due to time zone effects and such). Another potentially important fact that it took almost two minutes for the 407 response to show up in the capture. It is not clear (to me) whether the delay was due to network problems (in one or both directions) or due to Squid. Again, follow (tcp.stream eq 32) in Wireshark to see the details. The "I do not know which transaction got stuck and whether that transaction go cache-logged" conclusion and suggestions in my original response still apply AFAICT. Thank you, Alex. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
On Wednesday 28 September 2016 at 17:37:58, Alex Rousskov wrote: > AFAICT, Squid did not receive a request for www.ru: > > $ egrep -c '.ru|217.112.35.75' cache.log.debug > > 0 > > > > $ tshark -V -r squid-stuck-reference-client.pcap | egrep -c > > '.ru|217.112.35.75' 0 Is that a direct copy'n'paste from your terminal? If so, you tried one too many w's :( Antony. -- "There has always been an underlying argument that we should open up our source code more broadly. The fact is that we are learning from open source and we are opening our code more broadly through Shared Source. Is there value to providing source code? The answer is unequivocally yes." - Jason Matusow, head of Microsoft's Shared Source Program, in response to leaks of Windows source code on the Internet. Please reply to the list; please *don't* CC me. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
On 09/28/2016 02:22 AM, Eugene M. Zheganin wrote: > I took the debug trace and both the tcpdump client-side and server-side > (towards the internet) capturea. ... > I requested a http://www.ru/index.html from a client machine Chrome. AFAICT, Squid did not receive a request for www.ru: > $ egrep -c '.ru|217.112.35.75' cache.log.debug > 0 > $ tshark -V -r squid-stuck-reference-client.pcap | egrep -c > '.ru|217.112.35.75' > 0 If you find that request in your client-to-Squid packet captures, please point me to it. Otherwise, please fix your captures and redo the test from scratch. Also, please include access.log and/or some other indication of which transactions got stuck. You may want to stop captures _after_ you cancel the stuck request -- it is more difficult to find "what has not happened" (no answer for X seconds) than "what has happened" (e.g., a TCP connection was reset). Keep in mind that, in general, the request for the URL you type in the browser may succeed, but the browser may get stuck getting some resource on that page. You may limit captures to TCP and DNS traffic to and from Squid. Finally, for the future, [if you cannot isolate the problem to one HTTP transaction(*),] it is best to capture everything that Squid receives or sends rather than trying to guess what the browser and Squid will receive or send -- according to cache.log, there were lots of Squid-server packets that your tcpdump configuration did not capture. Thank you, Alex. P.S. (*) Entering a regular page URL in the browser usually results in more than one HTTP transaction between the browser and Squid. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
Hi. On 28.09.2016 01:36, Alex Rousskov wrote: > On 09/27/2016 02:02 PM, Eugene M. Zheganin wrote: > >> I guess squid >> didn't get a way to increase debug level on the fly ? > "squid -k debug" (or sending an equivalent signal) does that: > http://wiki.squid-cache.org/SquidFaq/BugReporting#Detailed_Debug_Output > > You will not get ALL,9 this way, unfortunately, but ALL,7 might be enough. > > I took the debug trace and both the tcpdump client-side and server-side (towards the internet) capturea. Since the debug log is way heavy, I decided to put all of the three files on the web-server. Here they are: Squid debug log (ALL,7): http://zhegan.in/files/squid/cache.log.debug tcpdump client-side capture (windump -s 0 -w squid-stuck-reference-client.pcap -ni 1): http://zhegan.in/files/squid/squid-stuck-reference-client.pcap tcpdump server-side capture, towards the outer world, empty - obviously, server didn't send anything outside (tcpdump -s 0 -w squid-stuck-reference-server.pcap -ni vlan23 host 217.112.35.75): http://zhegan.in/files/squid/squid-stuck-reference-server.pcap Test sequence: client - 192.168.3.215 squid - 192.168.3.1:3128 URL - http://www.ru/index.html I requested a http://www.ru/index.html from a client machine Chrome. No other applications were requesting this URL at this time there (however, capture does contain a lot of traffic, including HTTP sessions). Then I waited about a minute (loader in Chrome was spinning), and aborted both captures, then aborted the request. The aborted request probably made it to the squid log. Eugene. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
On 09/27/2016 02:02 PM, Eugene M. Zheganin wrote: > I guess squid > didn't get a way to increase debug level on the fly ? "squid -k debug" (or sending an equivalent signal) does that: http://wiki.squid-cache.org/SquidFaq/BugReporting#Detailed_Debug_Output You will not get ALL,9 this way, unfortunately, but ALL,7 might be enough. Alex. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
Hi. On 28.09.2016 0:29, Alex Rousskov wrote: Since you can reproduce this, I suggest collecting ALL,9 log for the stuck master transaction: http://wiki.squid-cache.org/SquidFaq/BugReporting#Debugging_a_single_transaction If collecting a debugging trace is impossible for some reason, then collect the corresponding TCP packets on the Squid to origin server link and post actual packets (not screenshots of packet summaries) from both connections. The debugging trace will most likely have the answer. The packet trace might have the answer. You may need to change user credentials for this test or after posting the details requested above. Well... I cannot reproduce it on purpose, I'm just saying it is self-reproducible for almost a year, in certain moments of time. Collecting a debug trace isn't hard by itself, but I'm pretty sure the restart will clear this state for a current machine (I guess squid didn't get a way to increase debug level on the fly ? at least I'm not aware of it; so I will need to restart it to set ALL,9), and I'll have to run with ALL,9 for quite some time, which is, obviously, not good for production, because it will create enormous amounts of logging in cache log. So I will post the tcpdump containing both exchanges, and if it will be still unclear I'll think about running in a debug mode. Thanks. Eugene. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
Re: [squid-users] connections from particular users sometimes get stuck
On 09/27/2016 11:13 AM, Eugene M. Zheganin wrote: > The wireshark screen where the issue is isolated for one particular > connection can be found here - > https://gyazo.com/fdec1d9d7c31a75afc7d4676abb83d15 (it's really a simple > picture: TCP connection establishing, then GET -> 407 -> GET and bunch > of keepalives, not a rocket science). > > Any ideas ? Since you can reproduce this, I suggest collecting ALL,9 log for the stuck master transaction: http://wiki.squid-cache.org/SquidFaq/BugReporting#Debugging_a_single_transaction If collecting a debugging trace is impossible for some reason, then collect the corresponding TCP packets on the Squid to origin server link and post actual packets (not screenshots of packet summaries) from both connections. The debugging trace will most likely have the answer. The packet trace might have the answer. You may need to change user credentials for this test or after posting the details requested above. Cheers, Alex. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users
[squid-users] connections from particular users sometimes get stuck
Hi. I have a weird problem. I run a squid cache 3.5.19 on FreeBSD/amd64, with about 300 active users, lots of authentication, external helpers (yeah, it's usually the place when one starts to post configs, but let me get to the point), and everything basically works just fine, but sometimes one particular user (don't know, may be it's one particular machine or some other entity) starts to have troubles. Usual trouble looks like the following: - around 299 users are working and authenticatiing just fine - one particular user starts experiencing connection stucking: his browser requests a web page, it starts to load and then some random object on it blocks indefinitely. - this happens every time on one machine, for the time given. This machine is permanent for a given issue, until it's gone. Then it's some another machine, and I cannot figure out the pattern. - this machine may be locked in this malfuctioning state for days. This state is usually cleared by the squid restart, or it may clear itself. - after a month or so the issue appears on another machine. and it persists on a new machine for quite some time. On a l3 level this looks simple: browser requests an object, gets 407 answer, replies with proper credentials set and then this connection goes indefinitely into a keepalived state: the squid and the browser send keepalives to each other, but nothing happens other than keepalives. User sees the spinning loader on a browser tab, and some content inside the tab, depending on how many objects the browses has received. In the same time new connections to squid are opening from this machine just fine, and the basic connectivity is normal for both the squid and the troubled machine. Furthermore, I'm sure that this problem isn't caused by bottlenecks on the squid machine: because it this way all the users would have eventually this problem, not only one. In the same time these aren't bottlenecks on the user machine: while the browser is stuck, other applications are working fine. If I switch the proxy to a backup squid (on another server) this machine is able to browse the internet. I really need to solve this, but I have no idea where to start. The error log show nothing suspicious. The wireshark screen where the issue is isolated for one particular connection can be found here - https://gyazo.com/fdec1d9d7c31a75afc7d4676abb83d15 (it's really a simple picture: TCP connection establishing, then GET -> 407 -> GET and bunch of keepalives, not a rocket science). Any ideas ? Thanks. Eugene. ___ squid-users mailing list squid-users@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-users