Hi Brendon,

On Sun, Sep 23, 2018 at 03:48:36PM -0500, Brendon Colby wrote:
(...)
> The next thing I did was to try adjusting net.ipv4.tcp_mem. This is the one
> setting almost everyone says to leave alone, that the kernel defaults are
> good enough. Well, adjusting this one setting is what seemed to fix this
> issue for us.
> 
> Here is the default values the kernel set on Devuan / Stretch:
> 
> net.ipv4.tcp_mem = 94401        125868  188802
> 
> On Jessie:
> 
> net.ipv4.tcp_mem = 92394        123194  184788
> 
> Here is what I set it to:
> 
> net.ipv4.tcp_mem = 16777216 16777216 16777216

Just be careful as you are allocating 64GB of RAM to the TCP stack.
However if this helps in your case, one possible explanation could be
that you're experiencing some significant latency to get out of the VM,
thus making the traffic more bursty, and are exceeding the buffers more
often.

As a hint, take a look at the connection timers in your logs. I guess
you mostly connect to servers belonging to the local network. You
should almost always see "0" as the connect time, with occasional
jumps to "1" (millisecond) due to timer resolution. When VMs exhibit
large latencies, e.g. because sub-CPUs are allocated, it's very common
to see larger values there (5-10 ms). You can be sure that if it takes
5 ms for a packet to reach another host on the local network and for
the response to come back, then someone has to buffer it during all
this time where you don't have access to the CPU, and at high bandwidth
it means that your 2+ Gbps could in fact appear as 10-20 Gbps bursts
followed by large pauses.

I have not observed any performance issue with 4.9 on hardware machines,
I'd even say that the performance is very good saturating 2 10G ports
with little CPU. But I also confess I've not run a VM test myself for a
while because each time I feel like I'm going to throw up in the middle
of the test :-/

So it might be possible that the 4.9 changes you're observing only/mostly
affect VMs. I remember about changes close to this version enabling TCP
pacing which helps a lot to avoid filling switch buffers when sending.
I also see how that may probably not improve anything in VMs which have
to share their CPU. But it should not affect Rx.

> Since almost everyone says "do NOT adjust tcp_mem" there isn't much
> documentation out there that I can find on when you SHOULD adjust this
> setting.

We're mostly saying this because everywhere on the net we find copies of
bad values for this field, resulting in out of memory issues for those
who blindly copy-paste them. It can make sense to tune it once you're
certain what you're doing (I think we still do it in our ALOHA appliances,
I'm not certain but I'm certain we used to, though we started with kernel
2.4).

Regards,
Willy

Reply via email to