Hi Brendon,

On Wed, Sep 26, 2018 at 02:45:29PM -0500, Brendon Colby wrote:
>   Tc mean: 0.59 ms
>   Tc std dev: 17.49 ms
>   Tc max: 1033.00 ms
>   Tc median: 0.00 ms

I don't know if all your servers are local, but if that's the case, the
Tc should always be very small and an stddev of 17ms and a max of 1s are
huge. This could indicate random pauses in the hypervisor which could
confirm the possibilities of traffic bursts I was talking about.

> So it looks like Tc isn't the issue here. Everything else looks good
> to my eyes. I still think something else changed, because on Jessie
> this never happened like I said.

As I said it's very possible that with a change of limit you were
slightly above the minimum required settings and now you're slightly
below after the kernel change.

> > But I also confess I've not run a VM test myself for a
> > while because each time I feel like I'm going to throw up in the middle
> > of the test :-/
> 
> I know that's always been your position on VMs (ha) but one day I
> decided to try it for myself and haven't had a single issue until now.
> Our old hardware sat nearly 100% idle most of the time, so it was hard
> to justify the expense.

Oh don't get me wrong, I know how this happens and am not complaining
about it. I'm just saying that using VMs is mostly a cost saving
solution and that if you cut costs you often have to expect a sacrifice
on something else. For those who can stand a slight degradation of
performance or latency, or spend some time chasing issues which do not
exist from time to time, that's fine. Others can't afford this at all
and will prefer bare metal. It's just a matter of trade-off.

At least if you found a way to tune your system to work around this
issue, you should simply document it somewhere for you or your coworkers
and consider yourself lucky to have saved $93/mo without degrading the
performance :-)

Willy

Reply via email to