Re: Throughput slow with kernel 4.9.0
On Wed, Sep 26, 2018 at 10:22 PM Willy Tarreau wrote: > This could indicate random pauses in the hypervisor which could > confirm the possibilities of traffic bursts I was talking about. Ahh I see, that makes sense. > It's just a matter of trade-off. Definitely. For us, the degradation is unnoticeable. > and consider yourself lucky to have saved $93/mo Actually I misspoke there. The full build for us would cost about $600/mo. That said, I'm really tempted to fire a server up and re-run stats. It would be interesting to see if Tc std dev and max comes down to < 0. -- Brendon Colby Senior DevOps Engineer Newgrounds.com
Re: Throughput slow with kernel 4.9.0
Hi Brendon, On Wed, Sep 26, 2018 at 02:45:29PM -0500, Brendon Colby wrote: > Tc mean: 0.59 ms > Tc std dev: 17.49 ms > Tc max: 1033.00 ms > Tc median: 0.00 ms I don't know if all your servers are local, but if that's the case, the Tc should always be very small and an stddev of 17ms and a max of 1s are huge. This could indicate random pauses in the hypervisor which could confirm the possibilities of traffic bursts I was talking about. > So it looks like Tc isn't the issue here. Everything else looks good > to my eyes. I still think something else changed, because on Jessie > this never happened like I said. As I said it's very possible that with a change of limit you were slightly above the minimum required settings and now you're slightly below after the kernel change. > > But I also confess I've not run a VM test myself for a > > while because each time I feel like I'm going to throw up in the middle > > of the test :-/ > > I know that's always been your position on VMs (ha) but one day I > decided to try it for myself and haven't had a single issue until now. > Our old hardware sat nearly 100% idle most of the time, so it was hard > to justify the expense. Oh don't get me wrong, I know how this happens and am not complaining about it. I'm just saying that using VMs is mostly a cost saving solution and that if you cut costs you often have to expect a sacrifice on something else. For those who can stand a slight degradation of performance or latency, or spend some time chasing issues which do not exist from time to time, that's fine. Others can't afford this at all and will prefer bare metal. It's just a matter of trade-off. At least if you found a way to tune your system to work around this issue, you should simply document it somewhere for you or your coworkers and consider yourself lucky to have saved $93/mo without degrading the performance :-) Willy
Re: Throughput slow with kernel 4.9.0
On Tue, Sep 25, 2018 at 10:51 PM Willy Tarreau wrote: Hi Willy, > Just be careful as you are allocating 64GB of RAM to the TCP stack. Yeah, after I figured that out I set it down to 1M / 4GB ram which is enough to handle peak traffic on only one proxy VM. > As a hint, take a look at the connection timers in your logs. I calculated some stats from a sample of about 400K requests: Tw mean: 0.00 ms Tw std dev: 0.00 ms Tw max: 0.00 ms Tw median: 0.00 ms Tt mean: 296.70 ms Tt std dev: 4724.12 ms Tt max: 570127.00 ms Tt median: 1.00 ms Tr mean: 22.90 ms Tr std dev: 129.48 ms Tr max: 19007.00 ms Tr median: 1.00 ms Tc mean: 0.59 ms Tc std dev: 17.49 ms Tc max: 1033.00 ms Tc median: 0.00 ms Tq mean: 0.01 ms Tq std dev: 7.90 ms Tq max: 4980.00 ms Tq median: 0.00 ms So it looks like Tc isn't the issue here. Everything else looks good to my eyes. I still think something else changed, because on Jessie this never happened like I said. > But I also confess I've not run a VM test myself for a > while because each time I feel like I'm going to throw up in the middle > of the test :-/ I know that's always been your position on VMs (ha) but one day I decided to try it for myself and haven't had a single issue until now. Our old hardware sat nearly 100% idle most of the time, so it was hard to justify the expense. Performance on hardware is much better, I'm sure, but none of my tests show enough of a performance boost to justify even the $93/mo servers I was looking at renting. Properly tuned VMs have worked really well for us. > We're mostly saying this because everywhere on the net we find copies of > bad values for this field, resulting in out of memory issues for those > who blindly copy-paste them. Yep I totally understand that. I think I was just saying that since everyone says "never change this" there was no discussion around what it is exactly, what it does, what happens during memory pressure mode, how to measure if you need to change it, etc. > Regards, > Willy Thanks for chiming in on this, Willy. -- Brendon Colby Senior DevOps Engineer Newgrounds.com
Re: Throughput slow with kernel 4.9.0
Hi Brendon, On Sun, Sep 23, 2018 at 03:48:36PM -0500, Brendon Colby wrote: (...) > The next thing I did was to try adjusting net.ipv4.tcp_mem. This is the one > setting almost everyone says to leave alone, that the kernel defaults are > good enough. Well, adjusting this one setting is what seemed to fix this > issue for us. > > Here is the default values the kernel set on Devuan / Stretch: > > net.ipv4.tcp_mem = 94401125868 188802 > > On Jessie: > > net.ipv4.tcp_mem = 92394123194 184788 > > Here is what I set it to: > > net.ipv4.tcp_mem = 16777216 16777216 16777216 Just be careful as you are allocating 64GB of RAM to the TCP stack. However if this helps in your case, one possible explanation could be that you're experiencing some significant latency to get out of the VM, thus making the traffic more bursty, and are exceeding the buffers more often. As a hint, take a look at the connection timers in your logs. I guess you mostly connect to servers belonging to the local network. You should almost always see "0" as the connect time, with occasional jumps to "1" (millisecond) due to timer resolution. When VMs exhibit large latencies, e.g. because sub-CPUs are allocated, it's very common to see larger values there (5-10 ms). You can be sure that if it takes 5 ms for a packet to reach another host on the local network and for the response to come back, then someone has to buffer it during all this time where you don't have access to the CPU, and at high bandwidth it means that your 2+ Gbps could in fact appear as 10-20 Gbps bursts followed by large pauses. I have not observed any performance issue with 4.9 on hardware machines, I'd even say that the performance is very good saturating 2 10G ports with little CPU. But I also confess I've not run a VM test myself for a while because each time I feel like I'm going to throw up in the middle of the test :-/ So it might be possible that the 4.9 changes you're observing only/mostly affect VMs. I remember about changes close to this version enabling TCP pacing which helps a lot to avoid filling switch buffers when sending. I also see how that may probably not improve anything in VMs which have to share their CPU. But it should not affect Rx. > Since almost everyone says "do NOT adjust tcp_mem" there isn't much > documentation out there that I can find on when you SHOULD adjust this > setting. We're mostly saying this because everywhere on the net we find copies of bad values for this field, resulting in out of memory issues for those who blindly copy-paste them. It can make sense to tune it once you're certain what you're doing (I think we still do it in our ALOHA appliances, I'm not certain but I'm certain we used to, though we started with kernel 2.4). Regards, Willy
Re: Throughput slow with kernel 4.9.0
Hi Aaron, On Tue, Sep 25, 2018 at 1:04 PM Aaron West wrote: > It seems that the Kernel developers decided to halve the default TCP > memory in the 4.x kernels Your colleague emailed the list about this last Nov. It was the ONLY thing I could find on this matter anywhere and was helpful in pointing me the right way. The crazy thing is that I doubled those numbers to what they were in Jessie and we still had slow downloads. I think this was because memory pressure mode was still being reached. Something else must have changed because I never touched tcp_mem prior to this and have never seen this sort of thing happen before. > Simply decide if you need to increase it by looking out for > the error message: Normally you won't ever see an error message, at least not in my experience. That's what was so frustrating about this. Once the middle value (pressure) is reached, the kernel appears to start throttling connections somehow (I think it starts to reduce the max buffer size that can be allocated per connection). Nothing is ever reported in the logs about this. Only when you set the three values for tcp_mem the same will you see the error message (at least the pressure and high values). > Anyway, just thought I'd mention it for info and to say you are not alone ;) Thanks, I appreciate it! -- Brendon Colby Senior DevOps Engineer Newgrounds.com
Re: Throughput slow with kernel 4.9.0
Hi Brendon, I just wanted to reach out and say that we found this too! It seems that the Kernel developers decided to halve the default TCP memory in the 4.x kernels, it probably makes sense for most applications but not when dealing with busy high network usage like we typically see when acting as a load balancer and/or reverse proxy. The actual change is mentioned here: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=b66e91ccbc34ebd5a2f90f9e1bc1597e2924a500 For me reducing it by 50% didn't work well... So I wrote a script to simply double TCP memory if a newer Kernel is detected as I knew it was reduced by 50% from what I had been used to and it always worked for me on the old defaults. However, your method is better(Less lazy)... Simply decide if you need to increase it by looking out for the error message: TCP: out of memory -- consider tuning tcp_mem Anyway, just thought I'd mention it for info and to say you are not alone ;) Aaron West Loadbalancer.org Ltd. www.loadbalancer.org
Throughput slow with kernel 4.9.0
Greetings, Similar to this user: https://www.mail-archive.com/haproxy@formilux.org/msg27698.html I recently upgraded our proxy VMs from Debian 8/Jessie (kernel 3.16.0) to Devuan 2/ASCII (Debian Stretch w/o systemd, kernel 4.9.0). I know running haproxy on a VM is often discouraged, but we have done so for years with great success. Right now I'm stress testing the new build on ONE proxy VM doing 861 req/s, 2.26 Gbps outbound traffic, 70k pps in, 90k pps out with quite a bit of capacity to spare. It can be done with some tweaking but nothing much outside of what would have to be done on hardware. Our VM hosts are have one Xeon E5-2687W v4 processor (12 core, 24 logical), 256GB ram, and dual Intel 10G adapters, one for external traffic, one for internal traffic. I have the proxy VMs configured with 8 cores, 8GB ram, and two virtio adapters both with multi-queue set to 2 (which gives me two receive queues per adapter). We're running Proxmox 5. haproxy is a custom build of 1.8.14 built with: make TARGET=linux2628 USE_PCRE=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_ZLIB=1 USE_FUTEX=1 I have each receive queue pinned to a different processor (0 - 3). haproxy is configured with nbproc 4 and pinned to procs 4 - 7. iptables with connection tracking is enabled (I couldn't see ANY performance benefits from using a stateless firewall). I can get near wire speeds between VM hosts as well as between VM guests on the local network. The problem we saw right away was when any amount of traffic was flowing through these new proxy builds, single stream throughput would be severely reduced. Without load, I could pull down a file at 200+ Mbps with a single stream. With load, that would drop to 10-15 Mbps if that. This meant that 1080p videos would endlessly buffer and large images would load like they did in the 90s on dial-up. Not good. After a bunch of trial and error, I narrowed the issue down to the network layer itself. The only thing I could find that may have pointed to what was going on was this: # netstat -s | grep buffer 16889843 packets pruned from receive queue because of socket buffer overrun 7626 packets dropped from out-of-order queue because of socket buffer overrun 3912652 packets collapsed in receive queue due to low socket buffer These values were incrementing a lot faster than on the old build. My research on this pointed to w/rmem settings, which I've never adjusted before because most recommendations seem to be to leave these alone. Plus I could never determine that we actually needed to adjust these. Here are the sysctl settings we've been using for years: vm.swappiness=10 net.ipv4.tcp_tw_reuse=1 net.ipv4.ip_local_port_range=1024 65535 net.core.somaxconn=10240 net.core.netdev_max_backlog=10240 net.ipv4.conf.all.rp_filter=1 net.ipv4.tcp_max_syn_backlog=10240 net.ipv4.tcp_synack_retries=3 net.ipv4.tcp_syncookies=1 net.netfilter.nf_conntrack_max=4194304 After doing a TON of research, I decided to adjust the r/wmem settings. >From here: http://fasterdata.es.net/host-tuning/linux/ https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Welcome%20to%20High%20Performance%20Computing%20%28HPC%29%20Central/page/Linux%20System%20Tuning%20Recommendations I settled on the following: # allow testing with buffers up to 128MB net.core.rmem_max = 134217728 net.core.wmem_max = 134217728 # increase Linux autotuning TCP buffer limit to 64MB net.ipv4.tcp_rmem = 4096 87380 67108864 net.ipv4.tcp_wmem = 4096 65536 67108864 Which is good "For a host with a 10G NIC optimized for network paths up to 200ms RTT, and for friendliness to single and parallel stream tools..." which seemed fine for us. However, these settings didn't make any difference. The next thing I did was to try adjusting net.ipv4.tcp_mem. This is the one setting almost everyone says to leave alone, that the kernel defaults are good enough. Well, adjusting this one setting is what seemed to fix this issue for us. Here is the default values the kernel set on Devuan / Stretch: net.ipv4.tcp_mem = 94401125868 188802 On Jessie: net.ipv4.tcp_mem = 92394123194 184788 Here is what I set it to: net.ipv4.tcp_mem = 16777216 16777216 16777216 I can create the low throughput issue by changing tcp_mem back to the defaults. I'm not even sure the other settings are necessary (still testing that). Can anyone shed some light on why adjusting tcp_mem fixed this? Are the other settings needed / appropriate? I'm not fond of deploying anything into production with settings I've copied from the internet without fully understanding what I'm doing. Most posts on this only copy the kernel docs verbatim. Since almost everyone says "do NOT adjust tcp_mem" there isn't much documentation out there that I can find on when you SHOULD adjust this setting. All I know is that by changing tcp_mem I can run an iperf test and get over 1 Gbps even with site traffic being over 2 Gbps (we have 3 Gbps available). File downloads are