Re: [squid-users] connections from particular users sometimes get stuck

2016-09-30 Thread Alex Rousskov
On 09/30/2016 12:38 AM, Eugene M. Zheganin wrote:
> And the
> main sign indicating there's something wrong with this initial
> transaction was the fact that 407 answer took 42 seconds to appear in
> both tcpdump captures.

To avoid misunderstanding: There are many red flags in your logs,
including excessive processing delays. I am ignoring the ones that do
not appear to be related to the "stuck browser" problem.


>>> tcpdump capture taken from a client machine:
>>> http://zhegan.in/files/squid/squid-stuck-client.pcap

>> This capture one is missing most of the second transaction packets
>> (tcp.stream eq 186). I do not know why tcpdump was unable to collect them.

> This is because of the gap between Ctrl^C issued in the client machine
> cmd console and squid server ssh console (in this exact order) - I had
> to switch between windows on my desktop, and because about a minute has
> passed since the beginning of the transaction (42 seconds plus quite
> some time), and I was worried that the debug log will grow more and
> more, making it difficult to navigate then.

Please do not worry about log navigation -- you are already posting logs
with 95+% of irrelevant transactions; adding 2% more would make no
difference. Focus on collecting complete logs. However, please
_compress_ your logs before posting them.


>> In summary, your browser is probably stuck because Squid could not
>> accept a connection. Why did that accept call fail with ECONNABORTED?

>> If you cancel browser wait and repeat the request, will it work?

> Sometimes, but this means like 3-5% percents. 

This may be your pathway to the answer! A _random_ ECONNABORTED
accept(2) error would have a ~100% probability of disappearing on the
second attempt. Your observations seem to confirm that this error
(assuming Squid gets ECONNABORTED for all stuck transactions) is caused
by something on that client or something between that specific client
and Squid.

We obviously do not know what is wrong yet, but just for the sake of an
example, consider a mismatching negotiated Ethernet settings that lead
to packet loss and similar low-level problems that lead to TCP accept(2)
errors from Squid point of view. Again, I am not claiming that this is
what is going on.


> Most of the time
> rerequests lead to the same timeout, Chrome shows the request is
> "pending" in the developer's tools/network tab for dozens of seconds,
> and so on.

Very good. This is something we can use to investigate.



> I still have this machine in stuck state (I think), what should I focus on ?

I recommend the following:

0. Start capturing to/from-Squid packets on the client machine.
   Make sure tcpdump does not do DNS resolution (if applicable).
1. Start capturing to/from-clientS packets on the Squid machine.
   Make sure tcpdump does not do DNS resolution (if applicable).
2. Start Squid debugging.

3. On both machines (if possible):
   Collect "netstat -s" or equivalent TCP stack stats.
   Collect "ifconfig ..." or equivalent high-level interface stats.
   Collect "ethtool -s ..." or equivalent low-level interface stats.

4. Reproduce the problem. Wait until Chrome times out.

5. On both machines (if possible):
   Collect "netstat -s" or equivalent TCP stack stats.
   Collect "ifconfig ..." or equivalent high-level interface stats.
   Collect "ethtool -s ..." or equivalent low-level interface stats.

6. Rereproduce the problem. Wait until Chrome times out.
   If possible, use a slightly different URL for this test.
   For example, append "?test2" to the URL in #4, assuming
   that will not screw things up.

7. On both machines (if possible):
   Collect "netstat -s" or equivalent TCP stack stats.
   Collect "ifconfig ..." or equivalent high-level interface stats.
   Collect "ethtool -s ..." or equivalent low-level interface stats.

8. Wait 60 seconds.

9. Stop all captures and Squid debugging.
10. Archive, compress, and share all the logs and the test URL(s).
When archiving, please preserve file modification times for
logs from steps 3,5,7.

Adjust the procedure as needed, of course.

Alex.

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] connections from particular users sometimes get stuck

2016-09-30 Thread Guy Helmer
On Sep 30, 2016, at 1:38 AM, Eugene M. Zheganin  wrote:
> 
> Hi.
> 
>>> 13:31:25.060 kid1| accept failure: (53) Software caused connection abort
>>> 13:31:25.865 kid1| accept failure: (53) Software caused connection abort
>>> 13:31:25.904 kid2| accept failure: (53) Software caused connection abort
>> The timestamps are a ~second off, but AFAICT, they are a ~second off for
>> successful accepts as well, so it is probably just a tcpdump logging
>> artifact.
>> 
>> In summary, your browser is probably stuck because Squid could not
>> accept a connection. Why did that accept call fail with ECONNABORTED? I
>> cannot say for sure -- the packet trace is rather dirty/misleading
>> (e.g., it shows the redirect packet being sent to the client _after_ the
>> client follows that redirect which does not make sense).
>> 
>> Any relevant errors in you system logs?
> Nothing except
> 
> Limiting closed port RST response from 294 to 200 packets/sec

Sometimes I have noticed odd problems when this appears — I’m thinking that on 
occasion there may be a necessary RST that is being ignored and causing a 
client connection to “hang”. The “net.inet.icmp.icmplim” sysctl limits the ICMP 
response rate to RST packets received for ports that are not open — perhaps 
increase that from the default of 200 to 400 and see if it helps.

Otherwise, are you using dynamic IPFW rules that involve the ports used by 
squid? I’ve had some issues with those in the past (timeouts or maximum numbers 
of rules exceeded), so I’ve switched to using static IPFW rules for connections 
in and out of squid to avoid issues.

Guy

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] connections from particular users sometimes get stuck

2016-09-28 Thread Alex Rousskov
On 09/28/2016 09:41 AM, Antony Stone wrote:
> On Wednesday 28 September 2016 at 17:37:58, Alex Rousskov wrote:
> 
>> AFAICT, Squid did not receive a request for www.ru:
>>> $ egrep -c '.ru|217.112.35.75' cache.log.debug
>>> 0
>>>
>>> $ tshark -V -r squid-stuck-reference-client.pcap | egrep -c
>>> '.ru|217.112.35.75' 0
> 
> Is that a direct copy'n'paste from your terminal?
> 
> If so, you tried one too many w's :(

Indeed! Fixing that exposes one HTTP request in the capture file.
Unfortunately,

1. Squid responded to that request (with a 407 message).
   Follow (tcp.stream eq 32) in Wireshark.

2. Squid did not receive this request when debugging was on:
   $ egrep -c 'www.ru|217.112.35.75' cache.log.debug
   0

It may be important to know that the captured request was received at
minute 53 while Squid debugging starts at minute 58 (I assume the
difference in hours is due to time zone effects and such).

Another potentially important fact that it took almost two minutes for
the 407 response to show up in the capture. It is not clear (to me)
whether the delay was due to network problems (in one or both
directions) or due to Squid. Again, follow (tcp.stream eq 32) in
Wireshark to see the details.


The "I do not know which transaction got stuck and whether that
transaction go cache-logged" conclusion and suggestions in my original
response still apply AFAICT.


Thank you,

Alex.

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] connections from particular users sometimes get stuck

2016-09-28 Thread Antony Stone
On Wednesday 28 September 2016 at 17:37:58, Alex Rousskov wrote:

> AFAICT, Squid did not receive a request for www.ru:
> > $ egrep -c '.ru|217.112.35.75' cache.log.debug
> > 0
> > 
> > $ tshark -V -r squid-stuck-reference-client.pcap | egrep -c
> > '.ru|217.112.35.75' 0

Is that a direct copy'n'paste from your terminal?

If so, you tried one too many w's :(


Antony.

-- 
"There has always been an underlying argument that we should open up our 
source code more broadly. The fact is that we are learning from open source 
and we are opening our code more broadly through Shared Source.

Is there value to providing source code? The answer is unequivocally yes."

 - Jason Matusow, head of Microsoft's Shared Source Program, in response to 
leaks of Windows source code on the Internet.

   Please reply to the list;
 please *don't* CC me.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] connections from particular users sometimes get stuck

2016-09-28 Thread Alex Rousskov
On 09/28/2016 02:22 AM, Eugene M. Zheganin wrote:
> I took the debug trace and both the tcpdump client-side and server-side
> (towards the internet) capturea.
...
> I requested a http://www.ru/index.html from a client machine Chrome. 


AFAICT, Squid did not receive a request for www.ru:

> $ egrep -c '.ru|217.112.35.75' cache.log.debug 
> 0

> $ tshark -V -r squid-stuck-reference-client.pcap | egrep -c 
> '.ru|217.112.35.75'
> 0


If you find that request in your client-to-Squid packet captures, please
point me to it. Otherwise, please fix your captures and redo the test
from scratch.

Also, please include access.log and/or some other indication of which
transactions got stuck. You may want to stop captures _after_ you cancel
the stuck request -- it is more difficult to find "what has not
happened" (no answer for X seconds) than "what has happened" (e.g., a
TCP connection was reset). Keep in mind that, in general, the request
for the URL you type in the browser may succeed, but the browser may get
stuck getting some resource on that page.

You may limit captures to TCP and DNS traffic to and from Squid.

Finally, for the future, [if you cannot isolate the problem to one HTTP
transaction(*),] it is best to capture everything that Squid receives or
sends rather than trying to guess what the browser and Squid will
receive or send -- according to cache.log, there were lots of
Squid-server packets that your tcpdump configuration did not capture.


Thank you,

Alex.
P.S. (*) Entering a regular page URL in the browser usually results in
more than one HTTP transaction between the browser and Squid.

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] connections from particular users sometimes get stuck

2016-09-28 Thread Eugene M. Zheganin
Hi.

On 28.09.2016 01:36, Alex Rousskov wrote:
> On 09/27/2016 02:02 PM, Eugene M. Zheganin wrote:
>
>> I guess squid
>> didn't get a way to increase debug level on the fly ? 
> "squid -k debug" (or sending an equivalent signal) does that:
> http://wiki.squid-cache.org/SquidFaq/BugReporting#Detailed_Debug_Output
>
> You will not get ALL,9 this way, unfortunately, but ALL,7 might be enough.
>
>
I took the debug trace and both the tcpdump client-side and server-side
(towards the internet) capturea.
Since the debug log is way heavy, I decided to put all of the three
files on the web-server. Here they are:

Squid debug log (ALL,7):

http://zhegan.in/files/squid/cache.log.debug

tcpdump client-side capture (windump -s 0 -w
squid-stuck-reference-client.pcap -ni 1):

http://zhegan.in/files/squid/squid-stuck-reference-client.pcap

tcpdump server-side capture, towards the outer world, empty - obviously,
server didn't send anything outside (tcpdump -s 0 -w
squid-stuck-reference-server.pcap -ni vlan23 host 217.112.35.75):

http://zhegan.in/files/squid/squid-stuck-reference-server.pcap

Test sequence:

client - 192.168.3.215
squid - 192.168.3.1:3128
URL - http://www.ru/index.html

I requested a http://www.ru/index.html from a client machine Chrome. No
other applications were requesting this URL at this time there (however,
capture does contain a lot of traffic, including HTTP sessions). Then I
waited about a minute (loader in Chrome was spinning), and aborted both
captures, then aborted the request. The aborted request probably made it
to the squid log.

Eugene.

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] connections from particular users sometimes get stuck

2016-09-27 Thread Alex Rousskov
On 09/27/2016 02:02 PM, Eugene M. Zheganin wrote:

> I guess squid
> didn't get a way to increase debug level on the fly ? 

"squid -k debug" (or sending an equivalent signal) does that:
http://wiki.squid-cache.org/SquidFaq/BugReporting#Detailed_Debug_Output

You will not get ALL,9 this way, unfortunately, but ALL,7 might be enough.

Alex.

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] connections from particular users sometimes get stuck

2016-09-27 Thread Eugene M. Zheganin

Hi.

On 28.09.2016 0:29, Alex Rousskov wrote:

Since you can reproduce this, I suggest collecting ALL,9 log for the
stuck master transaction:

http://wiki.squid-cache.org/SquidFaq/BugReporting#Debugging_a_single_transaction

If collecting a debugging trace is impossible for some reason, then
collect the corresponding TCP packets on the Squid to origin server link
and post actual packets (not screenshots of packet summaries) from both
connections. The debugging trace will most likely have the answer. The
packet trace might have the answer.

You may need to change user credentials for this test or after posting
the details requested above.

Well... I cannot reproduce it on purpose, I'm just saying it is 
self-reproducible for almost a year, in certain moments of time. 
Collecting a debug trace isn't hard by itself, but I'm pretty sure the 
restart will clear this state for a current machine (I guess squid 
didn't get a way to increase debug level on the fly ? at least I'm not 
aware of it; so I will need to restart it to set ALL,9), and I'll have 
to run with ALL,9 for quite some time, which is, obviously, not good for 
production, because it will create enormous amounts of logging in cache 
log. So I will post the tcpdump containing both exchanges, and if it 
will be still unclear I'll think about running in a debug mode.


Thanks.
Eugene.
___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


Re: [squid-users] connections from particular users sometimes get stuck

2016-09-27 Thread Alex Rousskov
On 09/27/2016 11:13 AM, Eugene M. Zheganin wrote:

> The wireshark screen where the issue is isolated for one particular
> connection can be found here -
> https://gyazo.com/fdec1d9d7c31a75afc7d4676abb83d15 (it's really a simple
> picture: TCP connection establishing, then GET -> 407 -> GET and bunch
> of keepalives, not a rocket science).
> 
> Any ideas ?

Since you can reproduce this, I suggest collecting ALL,9 log for the
stuck master transaction:

http://wiki.squid-cache.org/SquidFaq/BugReporting#Debugging_a_single_transaction

If collecting a debugging trace is impossible for some reason, then
collect the corresponding TCP packets on the Squid to origin server link
and post actual packets (not screenshots of packet summaries) from both
connections. The debugging trace will most likely have the answer. The
packet trace might have the answer.

You may need to change user credentials for this test or after posting
the details requested above.


Cheers,

Alex.

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users


[squid-users] connections from particular users sometimes get stuck

2016-09-27 Thread Eugene M. Zheganin

Hi.

I have a weird problem. I run a squid cache 3.5.19 on FreeBSD/amd64, 
with about 300 active users, lots of authentication, external helpers 
(yeah, it's usually the place when one starts to post configs, but let 
me get to the point), and everything basically works just fine, but 
sometimes one particular user (don't know, may be it's one particular 
machine or some other entity) starts to have troubles. Usual trouble 
looks like the following:


- around 299 users are working and authenticatiing just fine

- one particular user starts experiencing connection stucking: his 
browser requests a web page, it starts to load and then some random 
object on it blocks indefinitely.


- this happens every time on one machine, for the time given. This 
machine is permanent for a given issue, until it's gone. Then it's some 
another machine, and I cannot figure out the pattern.


- this machine may be locked in this malfuctioning state for days. This 
state is usually cleared by the squid restart, or it may clear itself.


- after a month or so the issue appears on another machine. and it 
persists on a new machine for quite some time.


On a l3 level this looks simple: browser requests an object, gets 407 
answer, replies with proper credentials set and then this connection 
goes indefinitely into a keepalived state: the squid and the browser 
send keepalives to each other, but nothing happens other than 
keepalives. User sees the spinning loader on a browser tab, and some 
content inside the tab, depending on how many objects the browses has 
received. In the same time new connections to squid are opening from 
this machine just fine, and the basic connectivity is normal for both 
the squid and the troubled machine. Furthermore, I'm sure that this 
problem isn't caused by bottlenecks on the squid machine: because it 
this way all the users would have eventually this problem, not only one. 
In the same time these aren't bottlenecks on the user machine: while the 
browser is stuck, other applications are working fine. If I switch the 
proxy to a backup squid (on another server) this machine is able to 
browse the internet.


I really need to solve this, but I have no idea where to start. The 
error log show nothing suspicious.


The wireshark screen where the issue is isolated for one particular 
connection can be found here - 
https://gyazo.com/fdec1d9d7c31a75afc7d4676abb83d15 (it's really a simple 
picture: TCP connection establishing, then GET -> 407 -> GET and bunch 
of keepalives, not a rocket science).


Any ideas ?

Thanks.

Eugene.

___
squid-users mailing list
squid-users@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-users