Hi Maxim, On Sun, Mar 17, 2013 at 4:42 AM, Maxim Dounin <mdou...@mdounin.ru> wrote:
> Hello! > > On "these hosts"? Note that listen queue aka backlog size is > configured in _applications_ which call listen(). At a host level > you may only configure somaxconn, which is maximum allowed listen > queue size (but an application may still use anything lower, even > just 1). > "These hosts" means we have a lot of servers in production right now, and they all exhibit the same issue. It hasn't been a showstopper, but it's been occurring for as long as anyone can remember. The total number of upstream servers on a typical day is 6 machines (each running 3 service processes), and hosts running nginx account for another 4 machines. All of these are Ubuntu 12.04 64-bit VMs running on AWS EC2 m3.xlarge instance types. I was under the impression that /proc/sys/net/ipv4/tcp_max_syn_backlog was for configuring the maximum queue size on the host. It's set to 1024, here, and increasing the number doesn't change the frequency of the missed packets. /proc/sys/net/core/somaxconn is set to 500,000 Make sure to check actual listen queue sizes used on listen > sockets involved. On Linux (you are using Linux, right?) this > should be possible with "ss -nlt" (or "netstat -nlt"). According to `ss -nlt`, send-q on these ports is set to 128. And recv-q on all ports is 0. I don't know what this means for recv-q, use default? And would default be 1024? But according to `netstat -nlt` both queues are 0? > > > 2) Some other queue in the network stack is exhausted. This > > > might be nontrivial to track (but usually possible too). > > > > This is interesting, and could very well be it! Do you have any > > suggestions on where to start looking? > > I'm not a Linux expert, but quick search suggests it should be > possible with dropwatch, see e.g. here: > > > http://prefetch.net/blog/index.php/2011/07/11/using-netstat-and-dropwatch-to-observe-packet-loss-on-linux-servers/ Thanks for the tip! I'll take some time to explore this some more. And before anyone asks, I'm not using iptables or netfilter. That appears to be a common cause for TCP overhead when investigating similar issues. Jay
_______________________________________________ nginx mailing list nginx@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx