[squid-users] CLOSE_WAIT state in Squid leads to bandwidth drop

2013-11-26 Thread SaRaVanAn
Hi All,
  I am doing a small test for bandwidth measurement of  my test setup
while squid is running. I am running a script to pump the traffic from
client browser to Web-server via Squid box.   The script creates
around 50 user sessions and tries to do wget of randomly selected
dynamic URL's.
After some time , I m observing a drop in bandwidth of the link, which
is connecting the webserver even there is no HIT in the squid cache.
I analyzed  the netstat output during the problem scenario, I could
see Recv-q gets piled up in CLOSE_WAIT  tcp state of squid and also
squid stays in CLOSE_WAIT state for more than  a minute. The number of
squid sessions to webserver are getting dropped to 5 from 70, but
still tcp sessions from client to squid are around 80.

Without Squid, there is no drop in the bandwidth with the same load.

Why bandwidth is getting dropped when squid is running? Please provide
your suggestions on this.

Logs

Squid version : 2.6.STABLE14

2013-11-25 10:17:53 Collecting netstat  statistics...
tcp   248352  0 172.19.134.2:51439  194.50.177.163:80
 CLOSE_WAIT  5477/(squid)
tcp77229  0 172.19.134.2:41998  64.15.157.134:80
 CLOSE_WAIT  5477/(squid)
tcp15853  0 172.19.134.2:55344  64.136.20.39:80
 CLOSE_WAIT  5477/(squid)
tcp30022  0 172.19.134.2:47485  50.56.161.66:80
 CLOSE_WAIT  5477/(squid)
tcp30202  0 172.19.134.2:59213  198.90.22.194:80
 CLOSE_WAIT  5477/(squid)
tcp 9787  0 172.19.134.2:52761  184.26.136.73:80
 CLOSE_WAIT  5477/(squid)
tcp   106892  0 172.19.134.2:55109  184.26.136.115:80
 CLOSE_WAIT  5477/(squid)


2013-11-25 10:18:42 Collecting netstat  statistics...

tcp   248352  0 172.19.134.2:51439  194.50.177.163:80
 CLOSE_WAIT  5477/(squid)

tcp95558  0 172.19.134.2:42559  67.192.29.225:80
 CLOSE_WAIT  5477/(squid)

tcp77229  0 172.19.134.2:41998  64.15.157.134:80
 CLOSE_WAIT  5477/(squid)

tcp15853  0 172.19.134.2:55344  64.136.20.39:80
 CLOSE_WAIT  5477/(squid)

tcp30022  0 172.19.134.2:47485  50.56.161.66:80
 CLOSE_WAIT  5477/(squid)

tcp30202  0 172.19.134.2:59213  198.90.22.194:80
 CLOSE_WAIT  5477/(squid)

tcp 9787  0 172.19.134.2:52761  184.26.136.73:80
 CLOSE_WAIT  5477/(squid)

tcp   106892  0 172.19.134.2:55109  184.26.136.115:80
 CLOSE_WAIT  5477/(squid)


Squid info :

---

Connection information for squid:
Number of clients accessing cache:  3
Number of HTTP requests received:   257549
Number of ICP messages received:0
Number of ICP messages sent:0
Number of queued ICP replies:   0
Request failure ratio:   0.00
Average HTTP requests per minute since start:   1443.2
Average ICP messages per minute since start:0.0
Select loop called: 4924570 times, 2.174 ms avg
Cache information for squid:
Request Hit Ratios: 5min: 0.0%, 60min: 0.0%
Byte Hit Ratios:5min: -0.0%, 60min: 3.2%
Request Memory Hit Ratios:  5min: 0.0%, 60min: 0.0%
Request Disk Hit Ratios:5min: 0.0%, 60min: 0.0%
Storage Swap size:  107524 KB
Storage Mem size:   8408 KB
Mean Object Size:   20.69 KB
Requests given to unlinkd:  0


Regards,
Saravanan N


Re: [squid-users] CLOSE_WAIT state in Squid leads to bandwidth drop

2013-11-26 Thread Antony Stone
On Tuesday 26 November 2013 at 11:37, SaRaVanAn wrote:

> Hi All,
>   I am doing a small test for bandwidth measurement of  my test setup
> while squid is running. I am running a script to pump the traffic from
> client browser to Web-server via Squid box. 

Er, do you really mean you are sending data from the browser to the server?

> The script creates around 50 user sessions and tries to do wget of randomly
> selected dynamic URL's.

That sounds more standard - wget will fetch data from the server to the 
browser.

What do you mean by "dynamic URLs"?  Where / how is the content actually being 
generated?

> After some time,

Please define.

> I'm observing a drop in bandwidth of the link,

Please define - what network setup are you using - what bandwidth are you 
getting at the start. what level does it drop to, does it return to the 
previous level?

> Squid version : 2.6.STABLE14

That is rather old (the last release of the 2.6 branch was STABLE23 September 
2009).  Is there any reason you have not upgraded to a current version?


Regards,


Antony.

-- 
Behind the counter a boy with a shaven head stared vacantly into space,
a dozen spikes of microsoft protruding from the socket behind his ear.

 - William Gibson, Neuromancer (1984)

http://www.Open.Source.ITPlease reply to the list;
The Open Source IT forum   please don't CC me.


Re: [squid-users] CLOSE_WAIT state in Squid leads to bandwidth drop

2013-11-26 Thread SaRaVanAn
On Tue, Nov 26, 2013 at 5:16 PM, Antony Stone
 wrote:
> On Tuesday 26 November 2013 at 11:37, SaRaVanAn wrote:
>
>> Hi All,
>>   I am doing a small test for bandwidth measurement of  my test setup
>> while squid is running. I am running a script to pump the traffic from
>> client browser to Web-server via Squid box.
>
> Er, do you really mean you are sending data from the browser to the server?
>
>> The script creates around 50 user sessions and tries to do wget of randomly
>> selected dynamic URL's.
>
> That sounds more standard - wget will fetch data from the server to the
> browser.
   =
  The script randomly picks the URL from the list of URL's
defined in a file and tries to fetch that URL.

>
> What do you mean by "dynamic URLs"?  Where / how is the content actually being
> generated?
>
==
   Its a  standard list of URL's with question mark in the
end to avoid  Squid caching.
For example :  www.espncricinfo.com?

>> After some time,
>
> Please define.
>
==
After 15-20 minutes from the time of execution of script.

>> I'm observing a drop in bandwidth of the link,
>
> Please define - what network setup are you using - what bandwidth are you
> getting at the start. what level does it drop to, does it return to the
> previous level?
>

 eth0   eth1
Windows Laptop  - Linux machine(Squid Running) - Internet

We are measuring the outgoing traffic in the link(eth1), which leads
to the internet in order to calculate the bandwidth usage. Eth1 link
bandwidth capability is around 10 Mbps. we are able utilize a maximum
of 7-8 Mbps when squid is running. After 15 minutes, there is a sudden
drop in bandwidth from 8Mbps to 6.5 Mbps and it comes back to 8Mbps
after 2 -3 min.


>> Squid version : 2.6.STABLE14
>
> That is rather old (the last release of the 2.6 branch was STABLE23 September
> 2009).  Is there any reason you have not upgraded to a current version?
>
>
=
There are some practical difficulties(our side) in upgrading to
newer version.

> Regards,
>
>
> Antony.
>
> --
> Behind the counter a boy with a shaven head stared vacantly into space,
> a dozen spikes of microsoft protruding from the socket behind his ear.
>
>  - William Gibson, Neuromancer (1984)
>
> http://www.Open.Source.ITPlease reply to the list;
> The Open Source IT forum   please don't CC me.


Re: [squid-users] CLOSE_WAIT state in Squid leads to bandwidth drop

2013-12-04 Thread SaRaVanAn
Hi All,
   I need a help on this issue. On heavy network traffic with squid
running, link bandwidth is not utilized properly.  If I bypass squid,
my link bandwidth is utilized properly.

Updated topology:
=
  (10 Mbps Link)
client< --- > Squid Box <---> Proxy client<--> Proxy
server<---> webserver

During problem scenario, I could see more tcp sessions with FIN_WAIT_1
state in Proxy server . I also observed that Recv -q in CLOSE_WAIT
state is getting increased in Squid Box. The number of tcp sessions
from Squid to webserver are also getting dropped drastically.

Squid.conf

http_port 3128 tproxy transparent
http_port 80 accel defaultsite=xyz.abc.com
hierarchy_stoplist cgi-bin
acl VIDEO url_regex ^http://fa\.video\.abc\.com
cache allow VIDEO
acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
cache_mem 100 MB
cache_swap_low 70
cache_swap_high 80
maximum_object_size 51200 KB
maximum_object_size_in_memory 10 KB
ipcache_size 8192
fqdncache_size 8192
cache_replacement_policy heap LFUDA
memory_replacement_policy heap LFUDA
cache_dir aufs //var/logs/cache 6144 16 256
access_log //var/logs/access.log squid
cache_log //var/logs/cache.log
cache_store_log none
mime_table //var/opt/abs/config/acpu/mime.conf
pid_filename //var/run/squid.pid
refresh_pattern -i fa.video.abc.com/* 600 0% 600 override-expire
override-lastmod reload-into-ims ignore-reload
refresh_pattern -i video.abc.com/* 600 0% 600 override-expire
override-lastmod reload-into-ims ignore-reload
refresh_pattern -i media.abc.com/* 600 0% 600 override-expire
override-lastmod reload-into-ims ignore-reload
refresh_pattern -i xyz.abc.com/.*\.js 600 200% 600 override-expire
override-lastmod reload-into-ims
refresh_pattern -i xyz.abc.com/.*\.gif 600 200% 600 override-expire
override-lastmod reload-into-ims
refresh_pattern -i xyz.abc.com/.*\.jpg 600 200% 600 override-expire
override-lastmod reload-into-ims
refresh_pattern -i xyz.abc.com/.*\.jpg 600 200% 600 override-expire
override-lastmod reload-into-ims
refresh_pattern -i xyz.abc.com/.*\.png 600 200% 600 override-expire
override-lastmod reload-into-ims
refresh_pattern -i xyz.abc.com/.*\.css 600 200% 600 override-expire
override-lastmod reload-into-ims
refresh_pattern -i ^http://.wsj./.* 10 200% 10 override-expire
override-lastmod reload-into-ims ignore-reload
refresh_pattern -i \.(gif|png|jpg|jpeg|ico)$ 480 100% 480
override-expire override-lastmod reload-into-ims
refresh_pattern -i \.(htm|html|js|css)$ 480 100% 480 override-expire
override-lastmod reload-into-ims
refresh_pattern ^ftp:   144020% 10080
refresh_pattern ^gopher:14400%  1440
refresh_pattern .   0   20% 4320
quick_abort_min 0 KB
quick_abort_max 0 KB
negative_ttl 1 minutes
positive_dns_ttl 1800 seconds
forward_timeout 2 minutes
acl all src 0.0.0.0/0.0.0.0
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl to_localhost dst 127.0.0.0/8
acl SSL_ports port 443
acl Safe_ports port 80
acl Safe_ports port 21
acl Safe_ports port 443
acl Safe_ports port 70
acl Safe_ports port 210
acl Safe_ports port 1025-65535
acl Safe_ports port 280
acl Safe_ports port 488
acl Safe_ports port 591
acl Safe_ports port 777
acl CONNECT method CONNECT
acl video_server dstdomain cs.video.abc.com
always_direct allow video_server
acl PURGE method PURGE
http_access allow PURGE localhost
http_access deny PURGE
http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access deny CONNECT all
http_access allow all
icp_access allow all
tcp_outgoing_address 172.19.134.2
visible_hostname 172.19.134.2
server_persistent_connections off
logfile_rotate 1
error_map http://localhost:1000/abp/squidError.do 404
memory_pools off
store_objects_per_bucket 100
strip_query_terms off
coredump_dir //var/cache
store_dir_select_algorithm round-robin
cache_peer 172.19.134.2 parent 1000 0 no-query no-digest originserver
name=aportal
cache_peer www.abc.com parent 80 0 no-query no-digest originserver name=dotcom
cache_peer guides.abc.com parent 80 0 no-query no-digest originserver
name=travelguide
cache_peer selfcare.abc.com parent 80 0 no-query no-digest
originserver name=selfcare
cache_peer abcd.mediaroom.com parent 80 0 no-query no-digest
originserver name=mediaroom
acl webtrends url_regex ^http://statse\.webtrendslive\.com
acl the_host dstdom_regex xyz\.abc\.com
acl abp_regex url_regex ^http://xyz\.abc\.com/abp
acl gbp_regex url_regex ^http://xyz\.abc\.com/gbp
acl abcdstatic_regex url_regex ^http://xyz\.goginflight\.com/static
acl dotcom_regex url_regex ^www\.abc\.com
acl dotcomstatic_regex url_regex ^www\.abc\.com/static
acl travelguide_regex url_regex ^http://guides\.abc\.com
acl selfcare_regex url_regex ^http://selfcare\.abc\.com
acl mediaroom_regex url_regex ^http://abcd\.mediaroom\.com
never_direct allow abp_regex
cache_peer

Re: [squid-users] CLOSE_WAIT state in Squid leads to bandwidth drop

2013-12-04 Thread Eliezer Croitoru

Hey Saravanan,

The main issue is that we can try to support you in a very basic way but 
note that if it's a BUG it cannot be fixed later rather then porting a 
patch manually or to try newer versions of squid.
Sometimes it's a bit difficult to upgrade but you can compile squid 
without installing it and also installing it along side older version 
(with proper configurations).


Your problem is a bit difficult to understand since if you use a proxy 
server with 100hz I assume this is what you will get from it..
There are couple levels to the connections which needs to be analyzed 
first before jumping and throwing everything on the linux machine.
The availability of example bug reports is nice to analyze but I am not 
sure this is the case.


10Mbps link or 15Mbps link is almost the same but some things in the 
network are out of your hands.

First the diagram is a bit weird to me..
what is the network topology and what hardware are we talking about?
There is a reason for *dropping* from 6.5 to 8.5 Mbps.
Either this is being consumed in a way or it might be throttled in a way.
Both can be squid or in any other level of the link and even physical one.
A cat4 cable with a loose contact will lead for something like that in 
some cases.


So I am saying "from the ground up".
What is the IP of the client?
Is this server properly firewalled?
What is the basic TCP settings for CLOSE_WAIT timeout?
Do you have iptraf installed on this server?
You can look at the "general interface statistics" or "Detailed 
interface statistics" to identify couple things.


The iptraf tool can give you another angle on your network traffic(note 
that using it on a ssh can be confusing due to the ssh overhead usage of 
the link)


It can happen that squid server "slows" down the connection but not in 
most cases.


So we need: basic network diagram or "picture" like "a cable goes from 
this computer to this switch and from this switch to this router and 
from this router to this switch".

If you can add IP addresses it will help me to understand the big picture.

I am not sure yet what is the client IP and what is the speed between 
each connection and whether it's a full-duplex half-duplex or no-duplex 
support at all.

Are talking about a LAN traffic only? what about DNS and WAN traffic?

Thanks,
Eliezer

On 04/12/13 18:02, SaRaVanAn wrote:

Hi All,
I need a help on this issue. On heavy network traffic with squid
running, link bandwidth is not utilized properly.  If I bypass squid,
my link bandwidth is utilized properly.

Updated topology:
=
   (10 Mbps 
Link)
client< --- > Squid Box <---> Proxy client<--> Proxy
server<---> webserver

During problem scenario, I could see more tcp sessions with FIN_WAIT_1
state in Proxy server . I also observed that Recv -q in CLOSE_WAIT
state is getting increased in Squid Box. The number of tcp sessions
from Squid to webserver are also getting dropped drastically.

Squid.conf

http_port 3128 tproxy transparent
http_port 80 accel defaultsite=xyz.abc.com
hierarchy_stoplist cgi-bin
acl VIDEO url_regex ^http://fa\.video\.abc\.com
cache allow VIDEO
acl QUERY urlpath_regex cgi-bin \?
cache deny QUERY
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache
cache_mem 100 MB
cache_swap_low 70
cache_swap_high 80
maximum_object_size 51200 KB
maximum_object_size_in_memory 10 KB
ipcache_size 8192
fqdncache_size 8192
cache_replacement_policy heap LFUDA
memory_replacement_policy heap LFUDA
cache_dir aufs //var/logs/cache 6144 16 256
access_log //var/logs/access.log squid
cache_log //var/logs/cache.log
cache_store_log none
mime_table //var/opt/abs/config/acpu/mime.conf
pid_filename //var/run/squid.pid
refresh_pattern -i fa.video.abc.com/* 600 0% 600 override-expire
override-lastmod reload-into-ims ignore-reload
refresh_pattern -i video.abc.com/* 600 0% 600 override-expire
override-lastmod reload-into-ims ignore-reload
refresh_pattern -i media.abc.com/* 600 0% 600 override-expire
override-lastmod reload-into-ims ignore-reload
refresh_pattern -i xyz.abc.com/.*\.js 600 200% 600 override-expire
override-lastmod reload-into-ims
refresh_pattern -i xyz.abc.com/.*\.gif 600 200% 600 override-expire
override-lastmod reload-into-ims
refresh_pattern -i xyz.abc.com/.*\.jpg 600 200% 600 override-expire
override-lastmod reload-into-ims
refresh_pattern -i xyz.abc.com/.*\.jpg 600 200% 600 override-expire
override-lastmod reload-into-ims
refresh_pattern -i xyz.abc.com/.*\.png 600 200% 600 override-expire
override-lastmod reload-into-ims
refresh_pattern -i xyz.abc.com/.*\.css 600 200% 600 override-expire
override-lastmod reload-into-ims
refresh_pattern -i ^http://.wsj./.*  10 200% 10 override-expire
override-lastmod reload-into-ims ignore-reload
refresh_pattern -i \.(gif|png|jpg|jpeg|ico)$ 480 100% 480
override-expire override-lastmod reload-into-ims
refresh_pattern -i \.(htm|html|js|css)$ 480 100% 480 override-expir

Re: [squid-users] CLOSE_WAIT state in Squid leads to bandwidth drop

2013-12-04 Thread Amos Jeffries
On 5/12/2013 1:45 p.m., Eliezer Croitoru wrote:
> Hey Saravanan,
> 
> The main issue is that we can try to support you in a very basic way but
> note that if it's a BUG it cannot be fixed later rather then porting a
> patch manually or to try newer versions of squid.

Going by the description here and back in Aug when it was last posted
tis is not a bug exactly. But normal behaviour of TCP combined with
TPROXY limitations and large traffic flows.

When sockets run out traffic gets throttled until more become available.
When the level of constant TCP connection churn grows higher than
sockets becoming available there grows a backlog of client connections
holding sockets open and waiting for service.

Part of the problem is TPROXY. Each outgoing connection requires
identical src-IP:dst-IP:dst-port triplet as the incoming ones, thus
sharing the 64K src-port range between both inbound and outbound
connections. Normally this can be using a different src-IP with a full
64K ports on each side of the proxy.


So you *start* with that handicap, then on top of it this proxy is
churning through 42 sockets per second. For every port released for
re-use while it is in TIME_WAIT status 40K other sockets have been
needed (yes 40K needed out of ~32K available).
So TCP is constantly running a backlog of available sockets. Most of
which are consumed by new client connections.

Imagine the machine only had 16 sockets in total, and those sockets
needed to wait for 10 seconds before each use.
Note that with a proxy each connection requires 2 sockets (client
connection and server connection. eg in/out of the proxy).

(for the sake of simplicity this description assumes the socket is done
with in almost zero time).


1) When traffic is arriving at a rate of 1 connection every 2 seconds
everything looks perfectly fine.
 * 1 client socket gets used, and one server socket. Then released and
for the next 10 seconds there are 14 sockets available.
 * during that 10 second period, 5 more connections arrive and 10
sockets get used.
 * leaving the machine with 4 free sockets at the same time the first 2
are being re-added to the available pool. Making it 4-6 sockets
constatly free.

2) Compare that to a traffic rate of just 1 connection every second. To
begin with everything seens perfectly fine.
 * the first 8 connections happen perfectly. However they take 8 seconds
and completely empty the available pool of sockets.
 ** what is the proxy to do? it must wait for 2 more seconds for the
next sockets to be available.
 * during that 2 seconds another 2 connections have been attempted.
 * when the first 2 sockets become available both sockets get used by
accept()
 * the socket pool is now empty again and the proxy must wait another 1
second for more sockets to become available.
  - the proxy now has 2 inbound connections waiting to be served, 7
inbound sockets in TIME_WAIT and 7 outbound sockets in TIME_WAIT.
 * when the second 2 sockets become available, one is used to receive
the new waiting connection and one used to service an existing connection.
 * things continue until we reach the 16 second mark.
  - this is the repeat of that point when no new sockets were finishing
TIME_WAIT.
 * at the 20 second mark socekts are becoming available again
  - the proxy now has 4 inbound connections waiting to be served, 6
inbound sockets in TIME_WAIT and 6 outbound sockets in TIME_WAIT.

... the cycle continues with the gap between inbound and outbound
growing by 2 sockets every 8 seconds. If the clients were to all be
extremely patient the machine would end up with all sockets being used
by inbound connections and none for outbound.
 However, Squid contains a reserved-FD feature to prevent that situation
happening and clients get impatient and disconnect when the wait is too
long. So you will always see traffic flowing, but it will flow at a much
reduced rate with ever longer delays visible to clients, and somewhat
"bursty" flow rates as clients give up in bunches.


Notice how in (2) there is is all the *extra* waiting time above and
beyond what the traffic would normally take going through the proxy. In
fact the slower the traffic through the proxy the worse the problem
becomes as without connection persistence the transaction time is added
on top of each TIME_WAIT for socket re-use.



The important thing to be aware of is that this is normal behaviour,
nasty as it is. You will hit it on any proxy or relay software if you
throw large numbers of new TCP connections at it fast enough.



There are two ways to avoid this:

 1) reducing the amount of sockets the proxy allows to be closed. In
other words enable persistent connections in HTTP. Both server and
client connections. It is not perfect (especially in the old Squid like
2.6) but it avoids a lot of TIME_WAIT delays between HTTP requests.
 * then reduce the request processing time spent holding those sockets.
So that more traffic can flow through faster overall.


 2) reduce the traffic loadi