Re: RES: [squid-users] Squid box dropping connections

2011-11-18 Thread Amos Jeffries

On 19/11/2011 12:21 a.m., Nataniel Klug wrote:

Hi Eliezer,

Thanks for you answer:


well this is one of the big problems of the conntrack thingy..
what you can try is to also to change the tcp to:
sysctl net.ipv4.netfilter.ip_conntrack_tcp_timeout_established=3600
cause it might causing the problem of such a huge ammount of connection
tracking size.
the basic size is 120 minutes which can cause a lot of troubles in many

cases

of open connections.
and by the way.. do you really have 155K connections? it seems like too
much.

hope to hear more about the situation.

Regards Eliezer


  [Nataniel Klug] So Eliezer, I don't think I have 155k connections. Most of
them are FIN_WAIT1 (about 35~45k). I have 1000 pppoe clients behind this
squid box so even if each of them had 50 connections, I would have 50k. I
think closing really fast can solve the problem. I set it to close on 5
minutes and I will make a try right now.


Some assumption in there needs a double-check. Modern websites can use 
50 (or more) connections to load any given page. Clients are not 
uncommonly having several such pages browsing at once in tabbed browser 
agents. And Squid uses 2x sockets per client connection.


So, while 150K for 1K clients does seem unusual normally. It is within 
the upper limits they *could* be using if they happend to all be 
browsing at the same time. I would expect to see some correspondingly 
high request rate in the Squid stats though.


Amos


RE: RES: [squid-users] Squid box dropping connections

2011-11-17 Thread Jenny Lee



 From: listas.n...@cnett.com.br
 To: bodycar...@live.com; squid-users@squid-cache.org
 Date: Thu, 17 Nov 2011 15:55:20 -0300
 Subject: RES: [squid-users] Squid box dropping connections

 Hello Jenny,

 Thanks for your answer. Sorry I haven't wrote but my hashsize is already in
 the same value as conntrack_max. I have some out of memory in dmesg:

 Nov 17 15:43:13 02 kernel: Out of socket memory
 
 
Well, there you go. Here is your problem. You will need to decrease your 
hashsize. I suggest you experiment with conntract max and hashsize nad buckets 
and watch for errors like these.
 
There are couple of good docs out there explaining kernel memory use with 
conntrack.
 
 

 And in cache.log I was not able to find any CommBind. I am reading about
 this port ranges (ephemeral). I think my squid is using too many sockets:

 sockets: used 16662
 TCP: inuse 28433 orphan 12185 tw 2191 alloc 28787 mem 18786
 UDP: inuse 8 mem 0
 RAW: inuse 1
 FRAG: inuse 0 memory 0

 And it has about 16k files open right now. I will try to find a way to make
 more ports available. Thanks!
 
You can check available port range with: 
cat /proc/sys/net/ipv4/ip_local_port_range

And increase it with:
echo 1024 65535  /proc/sys/net/ipv4/ip_local_port_range
 
 
This is for RHEL6, I don't recall if it is the same for RHEL5.
 
Here is a small perl script to log these for post-mortem review. Put it to 
cron, run every minute as root. Then you can review later.
 
Your orphans don't look good to me. However, you have nolocalbind and you are 
using tproxy.
 
I am neither linux, nor perl, nor tproxy, nor tcp expert. Just someone trying 
to solve her problems. So approach all these with caution, I take no 
responsibility.
 
Good luck!
 
Jenny
 
 
 
#!/usr/bin/perl

$ct = `cat /proc/sys/net/netfilter/nf_conntrack_count`;
chomp $ct;
@ss = `ss -s`;

foreach (@ss) {  
if 
(/TCP:\s+(\d+)\s+\(estab\s+(\d+),.+orphaned\s+(\d+),.+timewait\s+(\d+).+ports\s+(\d+)/)
 {
$tcp = $1; $est = $2; $orp = $3; $tw = $4; $ports = $5;
}
}


$file = /var/log/tcp.log;
$date=localtime(); 
 

open(OUT, $file);
print OUT $date: CT:$ct TCP:$tcp EST:$est ORP:$orp TW:$tw PORTS:$ports\n;
close OUT;